The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award


Culminating Project Type

Starred Paper

Degree Name

Computer Science: M.S.


Computer Science and Information Technology


School of Science and Engineering

First Advisor

Jie Hu Meichsner

Second Advisor

Donald Hamnes

Third Advisor

Jim Chen

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Hadoop, Spark, Performance, MapReduce, RDD


The main focus of this paper is to compare the performance between Hadoop and Spark on some applications, such as iterative computation and real-time data processing. The runtime architectures of both Spark and Hadoop will be compared to illustrate their differences, and the components of their ecosystems will be tabled to show their respective characteristics. In this paper, we will highlight the performance comparison between Spark and Hadoop as the growth of data size and iteration counts, and also show how to tune in Hadoop and Spark in order to achieve higher performance. At the end, there will be several appendixes which describes how to install and launch Hadoop and Spark, how to implement the three case studies using java programming, and how to verify the correctness of the running results.


I would like to thank my advisor Dr. Meichsner for offering a lot of valuable suggestions to my work. Without her help and coordination during my laboratory work, it is difficult that my whole progress has gone so smoothly. I also thank the committee members Dr. Hamnes and Dr. Chen for spending their time and energy in correcting my paper and providing beneficial suggestions. For the laboratory work, Martin Smith provides support for setting up hardware and system environment. I thank him for his time and patience.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.