Culminating Project Title
Date of Award
Culminating Project Type
Computer Science: M.S.
Computer Science and Information Technology
School of Science and Engineering
Jie Hu Meichsner
Andrew A. Anda
Jim Q. Chen
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Parallel processing, MapReduce, Hadoop, Big Data
No doubt we are entering the big data epoch. The datasets have gone from small to super large scale, which not only brings us benefits but also some challenges. It becomes more and more difficult to handle them with traditional data processing methods. Many companies have started to invest in parallel processing frameworks and systems for their own products because the serial methods cannot feasibly handle big data problems. The parallel database systems, MapReduce, Hadoop, Pig, Hive, Spark, and Twister are some examples of these products. Many of these frameworks and systems can handle different kinds of big data problems, but none of them can cover all the big data issues. How to wisely use existing parallel frameworks and systems to deal with large-scale data becomes the biggest challenge. We investigate and analyze the performance of parallel processing for big data. We review and analyze various parallel processing architectures and frameworks, and their capabilities for large-scale data. We also present the potential challenges on multiple techniques according to the characteristics of big data. At last, we present possible solutions for those challenges.
Luo, Cheng, "Survey of Parallel Processing on Big Data" (2017). Culminating Projects in Computer Science and Information Technology. 18.