Date of Award
12-2016
Culminating Project Type
Starred Paper
Degree Name
Information Assurance: M.S.
Department
Information Assurance and Information Systems
College
Herberger School of Business
First Advisor
Dennis Guster
Second Advisor
Lynn Collen
Third Advisor
Keith Ewing
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
hadoop, java, distributed, sqoop, hive, mysql
Abstract
Every day there is an exponential increase of information and this data must be stored and analyzed. Traditional data warehousing solutions are expensive. Apache Hadoop is a popular open source data store which implements map-reduce concepts to create a distributed database architecture. In this paper, a performance analysis project was devised that compares Apache Hive, which is built on top of Apache Hadoop, with a traditional database such as MySQL. Hive supports HiveQueryLanguage, a SQL like directive language which implements MapReduce jobs. These jobs can then be executed using Hadoop. Hive also has a system catalog – Metastore which is used to index data components. The Hadoop framework is developed to include a duplication detection system which helps managing multiple copies of the same data at the file level. The Java Server Pages and Java Servlet framework were used to build a Java web application to provide a web interface for the clients to access and analyze large data sets present in Apache Hive or MySQL databases.
Recommended Citation
Etikala, Punith Reddy, "Designing & Implementing a Java Web Application to Interact with Data Stored in a Distributed File System" (2016). Culminating Projects in Information Assurance. 11.
https://repository.stcloudstate.edu/msia_etds/11
Comments/Acknowledgements
This research paper about designing and implementing Java web applications to interact with data stored in a distributed file system was undertaken using resources provided by the Business Computing Research Laboratory of St. Cloud State University. Data used for the analyses came from the St. Cloud State University library.