Date of Award
3-2016
Culminating Project Type
Starred Paper
Degree Name
Computer Science: M.S.
Department
Computer Science and Information Technology
College
School of Science and Engineering
First Advisor
Jie Hu Meichsner
Second Advisor
Donald Hamnes
Third Advisor
Jim Chen
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Hadoop, Spark, Performance, MapReduce, RDD
Abstract
The main focus of this paper is to compare the performance between Hadoop and Spark on some applications, such as iterative computation and real-time data processing. The runtime architectures of both Spark and Hadoop will be compared to illustrate their differences, and the components of their ecosystems will be tabled to show their respective characteristics. In this paper, we will highlight the performance comparison between Spark and Hadoop as the growth of data size and iteration counts, and also show how to tune in Hadoop and Spark in order to achieve higher performance. At the end, there will be several appendixes which describes how to install and launch Hadoop and Spark, how to implement the three case studies using java programming, and how to verify the correctness of the running results.
Recommended Citation
Pan, Shengti, "The Performance Comparison of Hadoop and Spark" (2016). Culminating Projects in Computer Science and Information Technology. 7.
https://repository.stcloudstate.edu/csit_etds/7
Comments/Acknowledgements
I would like to thank my advisor Dr. Meichsner for offering a lot of valuable suggestions to my work. Without her help and coordination during my laboratory work, it is difficult that my whole progress has gone so smoothly. I also thank the committee members Dr. Hamnes and Dr. Chen for spending their time and energy in correcting my paper and providing beneficial suggestions. For the laboratory work, Martin Smith provides support for setting up hardware and system environment. I thank him for his time and patience.