The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award

5-2017

Culminating Project Type

Starred Paper

Degree Name

Computer Science: M.S.

Department

Computer Science and Information Technology

College

School of Science and Engineering

First Advisor

Jie H. Meichsner

Second Advisor

Omar Al - Azzam,

Third Advisor

Dennis C. Guster

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

unstructured data, web server log files, Apache Pig, HDFS

Abstract

Data extraction and analysis have recently received significant attention due to the evolution of social media and large volume of data available in an unstructured form. Hadoop and MapReduce have been continuously implementing and analyzing large amount of data. In this paper Apache Pig, which is one of the high-level platform for analyzing large volume of data and runs on the top of Hadoop is used to analyze unstructured log files and extract information. In this paper, weblog server files are used to analyze and extract meaningful information in an unstructured form to a structured form in Apache Pig framework The main purpose of this paper is to extract, transform and load unstructured data in an Apache Pig framework and analyze the data and its performance on local mode as well as MapReduce mode. This paper further explains in brief about the different steps required to analyze unstructured web server log files in Apache Pig. This paper also compares the efficiency when a large volume of data is processed on MapReduce mode and local mode.

Comments/Acknowledgements

I would like to express my sincere gratitude to Dr. Dennis C. Guster, Professor, Department of Information Systems for allowing me to undertake this work. I am grateful to my advisor and supervisor, Professor Dr. Jie H. Meichsner, Department of Computer Science Information and Technology, for her continuous guidance, advice effort, and invertible suggestion throughout the research. I am also grateful to my supervisor Dr. Omar Al-Azzam, Professor of Computer Science and Information Technology, for providing me the logistic support and his valuable suggestion to carry out my research successfully. I would also like to thank lab consultants of the Department of Information Systems for helping to carry out my research. I would also like to thank my friends of Computer Science for their help throughout the study. Lastly, I would like to express my sincere appreciation to my family, especially my husband, for encouraging and supporting me throughout the study.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.