The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award

12-2016

Culminating Project Type

Thesis

Degree Name

Information Assurance: M.S.

Department

Information Assurance and Information Systems

College

Herberger School of Business

First Advisor

Dennis Guster

Second Advisor

Jim Chen

Third Advisor

Mark Schmidt

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Abstract

The goal of this thesis is to establish a benchmark comparison of custom Java based code efficiency as it relates to similar MapReduce jobs. Four separate tasks were completed with custom Java and MapReduce code to produce the identical output. Network pcap data was analyzed with tshark, and the resulting text file used as input for the programs to be run. Each code base was required to determine the following information from the tshark data: a summation of the number of port access attempts by source IP address, the total traffic volume by IP protocol, the average packet length by source IP address, and the percentage of traffic volume by source IP address. All tests were performed within an Amazon Web Services environment, and multiple test runs were executed to ensure the overall efficiency was not affected by possible shared resources. A cost-benefit analysis was performed to determine a point in which MapReduce and Hadoop clusters are worth the extra cost of additional hardware based upon the cost comparison of one AWS EC2 instance versus a four cluster HDFS system.

Comments/Acknowledgements

I would like to thank my wife, Andrea Munsch for all her support and guidance during the entire thesis process. I could not have completed this without her love, encouragement, and advice.

Share

COinS