The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award

5-2025

Culminating Project Type

Thesis

Styleguide

apa

Degree Name

Information Assurance: M.S.

Department

Information Assurance and Information Systems

College

Herberger School of Business

First Advisor

Abdullah Abu Hussein

Second Advisor

Lynn Collen

Third Advisor

Jim Q. Chen

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Business Email Compromise (BEC), Email Security, Supervised Machine Learning, Gradient Boosting Algorithms, Cyber Threat Detection, Fraud Prevention

Abstract

Email communication is pivotal to business operations, enhancing financial growth, partnership, and collaboration. However, this reliance on email has introduced multilayered cyber threats, including Phishing, Malware, Spoofing, Data Breaches, and Encryption issues. This research presents a machine learning-based model to improve real-time BEC detection. The study began by reviewing BEC detection literature and analyzing 78 scholarly articles from 2018 to 2023 to identify existing research gaps. The study identified critical areas for development, particularly the real-time detection of new and evolving BEC tactics. State-of-the-art Gradient Boosting algorithms—CatBoost, LGBM, and XGBoost—were employed to differentiate BEC from non-BEC emails, based on 27 features extracted from the Enron email dataset. The initial performance metrics were promising, with LGBM, XGBoost, and CatBoost achieving accuracies of 93%, 93%, and 94%, respectively. CatBoost, demonstrating higher accuracy and F1 score, was further fine-tuned for enhanced performance. In an innovative data augmentation approach, Google Gemini, a sophisticated Large Language Model, generated new BEC email samples. These were combined with the original Enron email data set, using a CatBoost classifier to train on it to achieve an increased accuracy of 95%. This study highlights some of the limitations of current BEC detection approaches. Also, it illustrates the effective fusion of advanced machine learning algorithms and feature engineering in building robust cybersecurity defenses. Integrating supervised learning and large language models offers a new direction for email security enhancements and paves the way for interdisciplinary research.

Comments/Acknowledgements

I am deeply grateful to God for the health, strength, insight, and inspiration that fueled the completion of my master's thesis. My profound appreciation extends to Dr. Abdulla Abu Hussein, my thesis Chair, whose invaluable mentorship significantly contributed to my personal and academic growth. His expert guidance, constructive feedback, and encouragement enhanced my Python and machine learning skills and steered me towards a new professional path.

I also owe a debt of gratitude to Dr. Lynn Collen, my Academic Advisor, for her steadfast support and guidance throughout my studies. Her patience and expertise were instrumental in overcoming the academic challenges I encountered.

Thirdly, thanks to Dr. Jim Chen, whose positive feedback was vital for completing this research. These mentors' collective influence and encouragement have been pivotal in my career and personal development, inspiring ongoing growth, and learning.

Words cannot express my gratitude to my husband, Dr Jerry S. Pantuvo, whose sacrifice, understanding, encouragement, and love have been the cornerstone of my journey. My sons, Jerry and Michael, and my parents, siblings, and friends have been a constant source of inspiration and strength.

This thesis reflects the culmination of efforts supported by a network of family, mentors, and colleagues, to whom I am eternally thankful.

Available for download on Wednesday, May 13, 2026

Share

COinS