Date of Award
11-2016
Culminating Project Type
Starred Paper
Degree Name
Computer Science: M.S.
Department
Computer Science and Information Technology
College
School of Science and Engineering
First Advisor
Donald Hamnes
Second Advisor
Jie Meichsner
Third Advisor
Dennis Guster
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Performance, Comparison, Hadoop, Single Machine, External Sort, Optimizing
Abstract
With the coming of a big data era, Hadoop, developed by Doug Cutting and Mike Cafarella, was presented in 2005 [1], which turned over a new page in the history of cloud computing. The Hadoop Distributed File System (HDFS) is one of the most fundamental layers in Hadoop. In the big data world, the performance of dealing with big data from HDFS cannot satisfy the need because the amount of big data is getting larger and larger, and simultaneously, the increasing rate of growth of big data is faster and faster. Nowadays various new distributed file systems (DFS) are published attempting to solve this issue. The core problem hindering the performance from becoming more effective is the metadata service layer in HDFS, and most of the new DFSs are focusing on improving the metadata service as well. Most of the above-mentioned cases are centering on the issue of solving the big data problem. However, for a small or medium-sized company, the data they may use is not so big. In this case, do they need to build a distributed system to deal with their data? Of course, the data in these companies will be getting larger and larger. When will be the best time for them to need a distributed system to manage their data? This paper attempts to address this problem by comparing the different performances between a distributed system computation and a serial computation.
Recommended Citation
Xie, Zhao, "Performance Comparison of a Hadoop DFS to a Centralized File System of a Single Machine" (2016). Culminating Projects in Computer Science and Information Technology. 14.
https://repository.stcloudstate.edu/csit_etds/14
External Java code for lab experiment
Comments/Acknowledgements
I would like to thank my advisor Dr. Hamnes for offering a lot of valuable suggestions to my work. Without his guidance and assistance, it could not have been possible that my whole progress has gone so smoothly. Also, I would express my deep appreciation and indebtedness to the committee members—Dr. Meichsner and Dr. Guster, who contributed their time and energy in modifying my paper and providing insightful suggestions. My sincere appreciation also goes to Martin Smith, who provided great support in helping me set up the hardware and system environment for the laboratory work. Finally, my family, Hailei and Monica, gave me a huge support for this research.