What is RDD ?

Answer : Resilient Distributed Datasets (RDD) is a fundamental data structure…

What is RDD ?

  • Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects.
  • Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster.
  • Formally, an RDD is a read-only, partitioned collection of records.
  • RDDs can be created through deterministic operations on either data on stable storage or other RDDs.
  • RDD is a fault-tolerant collection of elements that can be operated in parallel.
what is RDD
Leave a Reply

Your email address will not be published.

You May Also Like