What is RDD ?

  • Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects.
  • Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster.
  • Formally, an RDD is a read-only, partitioned collection of records.
  • RDDs can be created through deterministic operations on either data on stable storage or other RDDs.
  • RDD is a fault-tolerant collection of elements that can be operated in parallel.
what is RDD

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,