Date of Award
12-2016
Culminating Project Type
Thesis
Degree Name
Information Assurance: M.S.
Department
Information Assurance and Information Systems
College
Herberger School of Business
First Advisor
Jim Chen
Second Advisor
Susantha Herath
Third Advisor
Balasubramanian Kasi
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Data mining, simbox fraud, Telecomunication, machine learning, international calls bypass fraud
Abstract
Fraud detection in telecommunication industry has been a major challenge. Various fraud management systems are being used in the industry to detect and prevent increasingly sophisticated fraud activities. However, such systems are rule-based and require a continuous monitoring by subject matter experts. Once a fraudster changes its fraudulent behavior, a modification to the rules is required. Sometimes, the modification involves building a whole new set of rules from scratch, which is a toilsome task that may by repeated many times.
In recent years, datamining techniques have gained popularity in fraud detection in telecommunication industry. Unlike rule based Simbox detection, data mining algorithms are able to detect fraud cases when there is no exact match with a predefined fraud pattern, this comes from the fuzziness and the statistical nature that is built into the data mining algorithms. To better understand the performance of data mining algorithms in fraud detection, this paper conducts comparisons among four major algorithms: Boosted Trees Classifier, Support Vector Machines, Logistic Classifier, and Neural Networks.
Results of the work show that Boosted Trees and Logistic Classifiers performed the best among the four algorithms with a false-positive ratio less than 1%. Support Vector Machines performed almost like Boosted Trees and Logistic Classifier, but with a higher false-positive ratio of 8%. Neural Networks had an accuracy rate of 60% with a false positive ratio of 40%. The conclusion is that Boosted Trees and Support Vector Machines classifiers are among the better algorithms to be used in the Simbox fraud detections because of their high accuracy and low false-positive ratios.
Recommended Citation
AlBougha, Mhd Redwan, "Comparing Data Mining Classification Algorithms in Detection of Simbox Fraud" (2016). Culminating Projects in Information Assurance. 17.
https://repository.stcloudstate.edu/msia_etds/17
Comments/Acknowledgements
I would like to express my gratitude and appreciation to my advisor, Dr. Jim Chen, who supported me throughout the work on my thesis with his patience, motivation, and immense knowledge.
Besides my advisor, I would also like to thank the rest of my thesis committee: Dr. Susantha Herath, who supported me throughout my Master’s program with support and understanding, and Dr. Balasubramanian Kasi, for using his valuable time to review my thesis paper.
I would also like to express my very profound gratitude to my parents, my sisters, Bilal and Wael, and all of my friends for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of writing this thesis. This accomplishment would not have been possible without them.