Culminating Project Title
Date of Award
Culminating Project Type
Computer Science: M.S.
Computer Science and Information Technology
School of Science and Engineering
Jie Hu Meichsner
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Hadoop, Spark, Performance, MapReduce, RDD
The main focus of this paper is to compare the performance between Hadoop and Spark on some applications, such as iterative computation and real-time data processing. The runtime architectures of both Spark and Hadoop will be compared to illustrate their differences, and the components of their ecosystems will be tabled to show their respective characteristics. In this paper, we will highlight the performance comparison between Spark and Hadoop as the growth of data size and iteration counts, and also show how to tune in Hadoop and Spark in order to achieve higher performance. At the end, there will be several appendixes which describes how to install and launch Hadoop and Spark, how to implement the three case studies using java programming, and how to verify the correctness of the running results.
Pan, Shengti, "The Performance Comparison of Hadoop and Spark" (2016). Culminating Projects in Computer Science and Information Technology. 7.