What is RDD ?



What is RDD ?

  • Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects.
  • Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster.
  • Formally, an RDD is a read-only, partitioned collection of records.
  • RDDs can be created through deterministic operations on either data on stable storage or other RDDs.
  • RDD is a fault-tolerant collection of elements that can be operated in parallel.
what is RDD

What is RDD

Charateristics in RDD

  • Hold references to the partition objects.
  • Each partition object references subset of your data.
  • Partitions are assigned to nodes on clusters.
  • Each partition/split will be a RAM.

Related Searches to What is RDD ?