Culminating Project Title
Date of Award
Culminating Project Type
Computer Science: M.S.
Computer Science and Information Technology
School of Science and Engineering
Jie H. Meichsner
Omar Al - Azzam,
Dennis C. Guster
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
unstructured data, web server log files, Apache Pig, HDFS
Data extraction and analysis have recently received significant attention due to the evolution of social media and large volume of data available in an unstructured form. Hadoop and MapReduce have been continuously implementing and analyzing large amount of data. In this paper Apache Pig, which is one of the high-level platform for analyzing large volume of data and runs on the top of Hadoop is used to analyze unstructured log files and extract information. In this paper, weblog server files are used to analyze and extract meaningful information in an unstructured form to a structured form in Apache Pig framework The main purpose of this paper is to extract, transform and load unstructured data in an Apache Pig framework and analyze the data and its performance on local mode as well as MapReduce mode. This paper further explains in brief about the different steps required to analyze unstructured web server log files in Apache Pig. This paper also compares the efficiency when a large volume of data is processed on MapReduce mode and local mode.
Niraula, Neeta, "Web Log Data Analysis: Converting Unstructured Web Log Data into Structured Data Using Apache Pig" (2017). Culminating Projects in Computer Science and Information Technology. 19.