Date of Award

5-2017

Culminating Project Type

Starred Paper

Degree Name

Computer Science: M.S.

Department

Computer Science and Information Technology

College

School of Science and Engineering

First Advisor

Jie Hu Meichsner

Second Advisor

Andrew A. Anda

Third Advisor

Jim Q. Chen

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Parallel processing, MapReduce, Hadoop, Big Data

Abstract

No doubt we are entering the big data epoch. The datasets have gone from small to super large scale, which not only brings us benefits but also some challenges. It becomes more and more difficult to handle them with traditional data processing methods. Many companies have started to invest in parallel processing frameworks and systems for their own products because the serial methods cannot feasibly handle big data problems. The parallel database systems, MapReduce, Hadoop, Pig, Hive, Spark, and Twister are some examples of these products. Many of these frameworks and systems can handle different kinds of big data problems, but none of them can cover all the big data issues. How to wisely use existing parallel frameworks and systems to deal with large-scale data becomes the biggest challenge. We investigate and analyze the performance of parallel processing for big data. We review and analyze various parallel processing architectures and frameworks, and their capabilities for large-scale data. We also present the potential challenges on multiple techniques according to the characteristics of big data. At last, we present possible solutions for those challenges.

Comments/Acknowledgements

I would like to thank my advisor Dr. Meichsner for offering a lot of valuable help and suggestions to my paper work. Without her help, I cannot finish this paper smoothly. I would also like to thank the committee members Dr. Anda and Dr. Chen for sharing their valuable time and advice on my paper research work.

Recommended Citation

Luo, Cheng, "Survey of Parallel Processing on Big Data" (2017). Culminating Projects in Computer Science and Information Technology. 18.
https://repository.stcloudstate.edu/csit_etds/18

Download

COinS

The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Culminating Projects in Computer Science and Information Technology

Survey of Parallel Processing on Big Data

Date of Award

Culminating Project Type

Degree Name

Department

College

First Advisor

Second Advisor

Third Advisor

Creative Commons License

Keywords and Subject Headings

Abstract

Comments/Acknowledgements

Recommended Citation

Search

Browse

Author Corner

Links

The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Culminating Projects in Computer Science and Information Technology

Survey of Parallel Processing on Big Data

Author

Date of Award

Culminating Project Type

Degree Name

Department

College

First Advisor

Second Advisor

Third Advisor

Creative Commons License

Keywords and Subject Headings

Abstract

Comments/Acknowledgements

Recommended Citation

Share

Search

Browse

Author Corner

Links