*7.2.4.6 General*

*Multimedia Information Retrieval*

number of reads and writes to disk.

*7.2.4.2 Iterative processing*

**Figure 5.**

*Spark architecture [32].*

*7.2.4.3 Interactive queries*

are only two Map and Reduce functions.

*7.2.4.4 Richer*

*7.2.4.5 Ease of use*

readable instructions.

dataset into distributed memory to optimize iterative workload and queries. Spark can run jobs 10 to 100 times faster than Hadoop MapReduce simply by reducing the

There are many algorithms which apply the same function to several steps. Like learning algorithms, Hadoop MapReduce is based on an acyclic data flow model, that is, the output of a previous MapReduce job is the input of the next MapReduce

MapReduce between two MapReduce operations, there is a synchronization barrier

For processing in interactive data extraction algorithms where a user needs to run multiple queries on the same subset of data, Hadoop loads the same data

But Spark loads the data only once, it stores that data in distributed memory, then it does the proper processing. For processing in interactive data extraction algorithms where a user needs to run multiple queries on the same subset of data.

Spark provides concise and consistent APIs to Scala, Java and Python and Supports multiple functions (actions and transformations), unlike Hadoop, there

Spark lets you quickly write applications in Java, Scala, or Python with simple,

But with Spark, the concept of RDD (Resilient Distributed Datasets) allows data to be saved to memory and preserve disk only for result operations. So it does not have a whole synchronization barrier that could possibly slow down the process. So

job. In this case we waste a lot of time in the I/O operation, so in Hadoop

and we need to keep the data on disk every time [33].

Spark allows to reduce the number of read/write on the disk.

multiple times from disk depending on the number of queries.

**12**

On the general side, Spark is designed to cover a wide range of workloads that previously require separate distributed systems, including real-time processing applications, iterative algorithms, interactive queries, and streaming. By supporting these workloads in the same engine, Spark makes it easy and inexpensive to combine the different types of processing, which is often required in production data analysis pipelines.
