Culminating Project Title
Date of Award
Culminating Project Type
Computer Science: M.S.
Computer Science and Information Technology
School of Science and Engineering
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Performance, Comparison, Hadoop, Single Machine, External Sort, Optimizing
With the coming of a big data era, Hadoop, developed by Doug Cutting and Mike Cafarella, was presented in 2005 , which turned over a new page in the history of cloud computing. The Hadoop Distributed File System (HDFS) is one of the most fundamental layers in Hadoop. In the big data world, the performance of dealing with big data from HDFS cannot satisfy the need because the amount of big data is getting larger and larger, and simultaneously, the increasing rate of growth of big data is faster and faster. Nowadays various new distributed file systems (DFS) are published attempting to solve this issue. The core problem hindering the performance from becoming more effective is the metadata service layer in HDFS, and most of the new DFSs are focusing on improving the metadata service as well. Most of the above-mentioned cases are centering on the issue of solving the big data problem. However, for a small or medium-sized company, the data they may use is not so big. In this case, do they need to build a distributed system to deal with their data? Of course, the data in these companies will be getting larger and larger. When will be the best time for them to need a distributed system to manage their data? This paper attempts to address this problem by comparing the different performances between a distributed system computation and a serial computation.
Xie, Zhao, "Performance Comparison of a Hadoop DFS to a Centralized File System of a Single Machine" (2016). Culminating Projects in Computer Science and Information Technology. 14.