*7.2.7.1 Definition*

An RDD is a collection of objects partitioned across a set of machines, allowing programmers to perform in-memory calculations on large clusters in a way that provides fault tolerance.4
