The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award


Culminating Project Type


Degree Name

Information Assurance: M.S.


Information Assurance and Information Systems


Herberger School of Business

First Advisor

Jim Chen

Second Advisor

Susantha Herath

Third Advisor

Balasubramanian Kasi

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Data mining, simbox fraud, Telecomunication, machine learning, international calls bypass fraud


Fraud detection in telecommunication industry has been a major challenge. Various fraud management systems are being used in the industry to detect and prevent increasingly sophisticated fraud activities. However, such systems are rule-based and require a continuous monitoring by subject matter experts. Once a fraudster changes its fraudulent behavior, a modification to the rules is required. Sometimes, the modification involves building a whole new set of rules from scratch, which is a toilsome task that may by repeated many times.

In recent years, datamining techniques have gained popularity in fraud detection in telecommunication industry. Unlike rule based Simbox detection, data mining algorithms are able to detect fraud cases when there is no exact match with a predefined fraud pattern, this comes from the fuzziness and the statistical nature that is built into the data mining algorithms. To better understand the performance of data mining algorithms in fraud detection, this paper conducts comparisons among four major algorithms: Boosted Trees Classifier, Support Vector Machines, Logistic Classifier, and Neural Networks.

Results of the work show that Boosted Trees and Logistic Classifiers performed the best among the four algorithms with a false-positive ratio less than 1%. Support Vector Machines performed almost like Boosted Trees and Logistic Classifier, but with a higher false-positive ratio of 8%. Neural Networks had an accuracy rate of 60% with a false positive ratio of 40%. The conclusion is that Boosted Trees and Support Vector Machines classifiers are among the better algorithms to be used in the Simbox fraud detections because of their high accuracy and low false-positive ratios.


I would like to express my gratitude and appreciation to my advisor, Dr. Jim Chen, who supported me throughout the work on my thesis with his patience, motivation, and immense knowledge.

Besides my advisor, I would also like to thank the rest of my thesis committee: Dr. Susantha Herath, who supported me throughout my Master’s program with support and understanding, and Dr. Balasubramanian Kasi, for using his valuable time to review my thesis paper.

I would also like to express my very profound gratitude to my parents, my sisters, Bilal and Wael, and all of my friends for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of writing this thesis. This accomplishment would not have been possible without them.