Date of Award
5-2026
Culminating Project Type
Thesis
Styleguide
apa
Degree Name
Computer Science: M.S.
Department
Computer Science and Information Technology
College
School of Science and Engineering
First Advisor
Jalal Khalil
Second Advisor
Singh Maninder
Third Advisor
Anda Andrew A
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Keywords and Subject Headings
Keywords: synthetic travel diaries, activity-based modeling, large language models, Markov chains, kernel density estimation, behavioral fidelity, Minneapolis–St. Paul
Abstract
Accurate representation of daily activity–travel behavior is essential for transportation planning, infrastructure modeling, and policy analysis. Traditional travel surveys provide rich contextual data but are costly, infrequent, and subject to declining response rates, while passive mobility data often lack behavioral context. This thesis investigates whether a structured generative framework combining probabilistic behavioral modeling with parameter- efficient large language model (LLM) fine-tuning can produce synthetic daily travel diaries that preserve key structural properties observed in a metropolitan travel survey. The study focuses on the Minneapolis–St. Paul–Bloomington metropolitan region using geographically filtered 2010 Travel Behavior Inventory (TBI) data and demographic distributions from the 2010 American Community Survey (ACS). A second-order Markov transition model with hierarchical backoff was implemented to preserve contextual activity sequencing, and kernel density estimation (KDE) was used to model arrival and departure time distributions. This structured generator produced 30,000 synthetic training diaries, which were used to fine-tune a Llama-2-13B model via Low-Rank Adaptation (LoRA) under 4-bit quantization. The fine-tuned model then generated 25,000 synthetic diaries, of which 19,395 structurally valid records were evaluated against 16,902 observed Person-Day records. Evaluation was conducted at three analytical levels: (1) pattern-level fidelity, (2) transition-level structure, and (3) activity-chain and temporal realism. Results indicate strong preservation of aggregate activity marginals (Jensen–Shannon divergence = 0.0142) and high similarity in first-order transition structure (normalized Frobenius distance = 0.0654). Moderate divergence was observed in second-order contextual transitions (normalized Frobenius distance = 0.1200), reflecting sparsity in higher-order triplets and hierarchical backoff effects. Sequence-level diversity was reduced, with fewer unique activity chains generated relative to the observed dataset, although dominant behavioral patterns were partially preserved. Substantial divergence emerged in temporal magnitude, as synthetic daily travel durations were significantly overestimated relative to observed data. Overall, the findings demonstrate that structured behavioral priors combined with parameter-efficient LLM fine-tuning can effectively preserve global activity composition and dominant transition patterns within a metropolitan context. However, higher- order contextual fidelity, long-tail behavioral diversity, and temporal magnitude calibration remain areas requiring further refinement. This work contributes a multi-level evaluation framework for synthetic travel diary generation and establishes a region-specific foundation for future research in structured mobility data synthesis. Keywords: synthetic travel diaries, activity-based modeling, large language models, Markov chains, kernel density estimation, behavioral fidelity, Minneapolis–St. Paul.
Recommended Citation
Ablornyi, Lord S., "Structured Synthetic Travel Diary Generation Using Markov and LLM Fine-Tuning" (2026). Culminating Projects in Computer Science and Information Technology. 70.
https://repository.stcloudstate.edu/csit_etds/70


Comments/Acknowledgements
The completion of this thesis represents not only an academic milestone but also a deeply personal journey of growth, persistence, and faith. I would first like to express my sincere appreciation to my thesis committee for their time, expertise, and thoughtful guidance throughout this research process. I am especially grateful to my committee chair, Dr. Jalal Khalil for steady leadership, constructive feedback, and encouragement at every stage of this work. The commitment and insight provided by Dr. Jalal Khalil significantly strengthened the quality and rigor of this study. I am also thankful to the faculty and mentors within the Computer Science Department for cultivating an environment that challenges students to think critically, innovate responsibly, and pursue excellence. The technical foundation and intellectual curiosity developed during my graduate studies have been instrumental in shaping this research. On a deeply personal level, I dedicate this achievement to my mother and wife. Their unwavering support, sacrifices, and constant belief in my potential have carried me through every challenge along this journey. Their strength and encouragement have been a source of resilience during the most demanding moments of this program. Above all, I give thanks to God for the wisdom, perseverance, and opportunity to pursue this path. Through every obstacle and breakthrough, faith has provided clarity, purpose, and the strength to continue forward. This thesis stands as a reflection of the collective support, mentorship, and love that made its completion possible.