The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award

5-2016

Culminating Project Type

Starred Paper

Degree Name

Computer Science: M.S.

Department

Computer Science and Information Technology

College

School of Science and Engineering

First Advisor

Donald Hamnes

Second Advisor

Dennis Guster

Third Advisor

Ramnath Sarnath

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Big Data, Hive, Hadoop, Netezza, Data Warehouse

Abstract

Considering the challenges posed by Big Data, the cost to scale traditional data warehouses is high and the performances would be inadequate to meet the growing needs of the volume, variety and velocity of data. The Hadoop ecosystem answers both of the shortcomings. Hadoop has the ability to store and analyze large data sets in parallel on a distributed environment but cannot replace the existing data warehouses and RDBMS systems due to its own limitations explained in this paper. In this paper, I identify the reasons why many enterprises fail and struggle to adapt to Big Data technologies. A brief outline of two different technologies to handle Big Data will be presented in this paper: Using IBM’s Pure Data system for analytics (Netezza) usually used in reporting, and Hadoop with Hive which is used in analytics. Also, this paper covers the Enterprise architecture consisting of Hadoop that successful companies are adapting to analyze, filter, process, and store the data running along a massively parallel processing data warehouse. Despite, having the technology to support and process Big Data, industries are still struggling to meet their goals due to the lack of skilled personnel to study and analyze the data, in short data scientists and data statisticians.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.