*7.2.4.4 Richer*

Spark provides concise and consistent APIs to Scala, Java and Python and Supports multiple functions (actions and transformations), unlike Hadoop, there are only two Map and Reduce functions.

#### *7.2.4.5 Ease of use*

Spark lets you quickly write applications in Java, Scala, or Python with simple, readable instructions.

**13**

Java.

*Towards Large Scale Image Retrieval System Using Parallel Frameworks*

*7.2.4.7 Spark's real-time streaming method to process streams*

On the general side, Spark is designed to cover a wide range of workloads that previously require separate distributed systems, including real-time processing applications, iterative algorithms, interactive queries, and streaming. By supporting these workloads in the same engine, Spark makes it easy and inexpensive to combine the different types of processing, which is often required in production data

In case of Hadoop MapReduce, it is just possible to process a flow of stored data, but with Apache Spark, it is thus possible to modify the data in real time thanks to

Developers can now as well make use of Apache Spark for graphics processing which maps relationships in data between various entities such as people and

Spark comes with a learning library called MLlib, it provides several types of learning algorithms, including classification, regression, grouping and collaborative filtering, as well as supporting features like evaluation of the template and data import [32]. But in Hadoop you have to integrate a learning library called Mahout.

Spark SQL is Spark's module for working with structured data, it allows querying data structured as a Distributed Data Set (RDD) in Spark, with built-in APIs in

Spark uses the HDFS file system for data storage. It also works with any Hadoop

Offers an interactive console for Scala and Python. This is not yet available in

Executing heavy processing on a cluster, controlling the slave nodes, distributing the tasks for them fairly, and arbitrating the amount of CPU and memory that will be allocated to each process, this is the role of a cluster manager. Spark currently offers three solutions for this: Spark standalone, YARN and Mesos. Comes with Spark,

<sup>3</sup> Spark Programming Guide-Spark 1.2.0 Documentation. [Online]. Available: http://spark.apache.org/

*DOI: http://dx.doi.org/10.5772/intechopen.94910*

*7.2.4.6 General*

analysis pipelines.

Spark streaming [32].

objects [32].

*7.2.4.8 Graphics processing*

*7.2.4.9 Learning algorithms*

Python, Scala, and Java.3

*7.2.4.11 Storage general*

*7.2.4.12 Interactive*

*7.2.5 Deployment*

docs/1.2.0/programming-guide.html

*7.2.4.10 Quick management of structured data*

compatible data source, including, HBase, Cassandra, etc.

*Towards Large Scale Image Retrieval System Using Parallel Frameworks DOI: http://dx.doi.org/10.5772/intechopen.94910*
