Date of Award
5-2025
Culminating Project Type
Thesis
Styleguide
apa
Degree Name
Information Assurance: M.S.
Department
Information Assurance and Information Systems
College
Herberger School of Business
First Advisor
Abdullah Abu Hussein
Second Advisor
Lynn Collen
Third Advisor
Jim Q. Chen
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Business Email Compromise (BEC), Email Security, Supervised Machine Learning, Gradient Boosting Algorithms, Cyber Threat Detection, Fraud Prevention
Abstract
Email communication is pivotal to business operations, enhancing financial growth, partnership, and collaboration. However, this reliance on email has introduced multilayered cyber threats, including Phishing, Malware, Spoofing, Data Breaches, and Encryption issues. This research presents a machine learning-based model to improve real-time BEC detection. The study began by reviewing BEC detection literature and analyzing 78 scholarly articles from 2018 to 2023 to identify existing research gaps. The study identified critical areas for development, particularly the real-time detection of new and evolving BEC tactics. State-of-the-art Gradient Boosting algorithms—CatBoost, LGBM, and XGBoost—were employed to differentiate BEC from non-BEC emails, based on 27 features extracted from the Enron email dataset. The initial performance metrics were promising, with LGBM, XGBoost, and CatBoost achieving accuracies of 93%, 93%, and 94%, respectively. CatBoost, demonstrating higher accuracy and F1 score, was further fine-tuned for enhanced performance. In an innovative data augmentation approach, Google Gemini, a sophisticated Large Language Model, generated new BEC email samples. These were combined with the original Enron email data set, using a CatBoost classifier to train on it to achieve an increased accuracy of 95%. This study highlights some of the limitations of current BEC detection approaches. Also, it illustrates the effective fusion of advanced machine learning algorithms and feature engineering in building robust cybersecurity defenses. Integrating supervised learning and large language models offers a new direction for email security enhancements and paves the way for interdisciplinary research.
Recommended Citation
Pantuvo, Salome Jerry, "A Novel Business Email Compromise Detection Model Using Supervised Machine Learning" (2025). Culminating Projects in Information Assurance. 149.
https://repository.stcloudstate.edu/msia_etds/149


Comments/Acknowledgements
I am deeply grateful to God for the health, strength, insight, and inspiration that fueled the completion of my master's thesis. My profound appreciation extends to Dr. Abdulla Abu Hussein, my thesis Chair, whose invaluable mentorship significantly contributed to my personal and academic growth. His expert guidance, constructive feedback, and encouragement enhanced my Python and machine learning skills and steered me towards a new professional path.
I also owe a debt of gratitude to Dr. Lynn Collen, my Academic Advisor, for her steadfast support and guidance throughout my studies. Her patience and expertise were instrumental in overcoming the academic challenges I encountered.
Thirdly, thanks to Dr. Jim Chen, whose positive feedback was vital for completing this research. These mentors' collective influence and encouragement have been pivotal in my career and personal development, inspiring ongoing growth, and learning.
Words cannot express my gratitude to my husband, Dr Jerry S. Pantuvo, whose sacrifice, understanding, encouragement, and love have been the cornerstone of my journey. My sons, Jerry and Michael, and my parents, siblings, and friends have been a constant source of inspiration and strength.
This thesis reflects the culmination of efforts supported by a network of family, mentors, and colleagues, to whom I am eternally thankful.