Date of Award

11-2016

Culminating Project Type

Starred Paper

Degree Name

Computer Science: M.S.

Department

Computer Science and Information Technology

College

School of Science and Engineering

First Advisor

Donald Hamnes

Second Advisor

Jie Meichsner

Third Advisor

Dennis Guster

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Performance, Comparison, Hadoop, Single Machine, External Sort, Optimizing

Abstract

With the coming of a big data era, Hadoop, developed by Doug Cutting and Mike Cafarella, was presented in 2005 [1], which turned over a new page in the history of cloud computing. The Hadoop Distributed File System (HDFS) is one of the most fundamental layers in Hadoop. In the big data world, the performance of dealing with big data from HDFS cannot satisfy the need because the amount of big data is getting larger and larger, and simultaneously, the increasing rate of growth of big data is faster and faster. Nowadays various new distributed file systems (DFS) are published attempting to solve this issue. The core problem hindering the performance from becoming more effective is the metadata service layer in HDFS, and most of the new DFSs are focusing on improving the metadata service as well. Most of the above-mentioned cases are centering on the issue of solving the big data problem. However, for a small or medium-sized company, the data they may use is not so big. In this case, do they need to build a distributed system to deal with their data? Of course, the data in these companies will be getting larger and larger. When will be the best time for them to need a distributed system to manage their data? This paper attempts to address this problem by comparing the different performances between a distributed system computation and a serial computation.

Comments/Acknowledgements

I would like to thank my advisor Dr. Hamnes for offering a lot of valuable suggestions to my work. Without his guidance and assistance, it could not have been possible that my whole progress has gone so smoothly. Also, I would express my deep appreciation and indebtedness to the committee members—Dr. Meichsner and Dr. Guster, who contributed their time and energy in modifying my paper and providing insightful suggestions. My sincere appreciation also goes to Martin Smith, who provided great support in helping me set up the hardware and system environment for the laboratory work. Finally, my family, Hailei and Monica, gave me a huge support for this research.

Recommended Citation

Xie, Zhao, "Performance Comparison of a Hadoop DFS to a Centralized File System of a Single Machine" (2016). Culminating Projects in Computer Science and Information Technology. 14.
https://repository.stcloudstate.edu/csit_etds/14

ExternalSortCode.pdf (1443 kB)
External Java code for lab experiment

Download

COinS

The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Culminating Projects in Computer Science and Information Technology

Performance Comparison of a Hadoop DFS to a Centralized File System of a Single Machine

Date of Award

Culminating Project Type

Degree Name

Department

College

First Advisor

Second Advisor

Third Advisor

Creative Commons License

Keywords and Subject Headings

Abstract

Comments/Acknowledgements

Recommended Citation

Search

Browse

Author Corner

Links

The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Culminating Projects in Computer Science and Information Technology

Performance Comparison of a Hadoop DFS to a Centralized File System of a Single Machine

Author

Date of Award

Culminating Project Type

Degree Name

Department

College

First Advisor

Second Advisor

Third Advisor

Creative Commons License

Keywords and Subject Headings

Abstract

Comments/Acknowledgements

Recommended Citation

Share

Search

Browse

Author Corner

Links