The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award

5-2026

Culminating Project Type

Thesis

Styleguide

apa

Degree Name

Computer Science: M.S.

Department

Computer Science and Information Technology

College

School of Science and Engineering

First Advisor

Jalal Khalil

Second Advisor

Singh Maninder

Third Advisor

Anda Andrew A

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

Keywords: synthetic travel diaries, activity-based modeling, large language models, Markov chains, kernel density estimation, behavioral fidelity, Minneapolis–St. Paul

Abstract

Accurate representation of daily activity–travel behavior is essential for transportation planning, infrastructure modeling, and policy analysis. Traditional travel surveys provide rich contextual data but are costly, infrequent, and subject to declining response rates, while passive mobility data often lack behavioral context. This thesis investigates whether a structured generative framework combining probabilistic behavioral modeling with parameter- efficient large language model (LLM) fine-tuning can produce synthetic daily travel diaries that preserve key structural properties observed in a metropolitan travel survey. The study focuses on the Minneapolis–St. Paul–Bloomington metropolitan region using geographically filtered 2010 Travel Behavior Inventory (TBI) data and demographic distributions from the 2010 American Community Survey (ACS). A second-order Markov transition model with hierarchical backoff was implemented to preserve contextual activity sequencing, and kernel density estimation (KDE) was used to model arrival and departure time distributions. This structured generator produced 30,000 synthetic training diaries, which were used to fine-tune a Llama-2-13B model via Low-Rank Adaptation (LoRA) under 4-bit quantization. The fine-tuned model then generated 25,000 synthetic diaries, of which 19,395 structurally valid records were evaluated against 16,902 observed Person-Day records. Evaluation was conducted at three analytical levels: (1) pattern-level fidelity, (2) transition-level structure, and (3) activity-chain and temporal realism. Results indicate strong preservation of aggregate activity marginals (Jensen–Shannon divergence = 0.0142) and high similarity in first-order transition structure (normalized Frobenius distance = 0.0654). Moderate divergence was observed in second-order contextual transitions (normalized Frobenius distance = 0.1200), reflecting sparsity in higher-order triplets and hierarchical backoff effects. Sequence-level diversity was reduced, with fewer unique activity chains generated relative to the observed dataset, although dominant behavioral patterns were partially preserved. Substantial divergence emerged in temporal magnitude, as synthetic daily travel durations were significantly overestimated relative to observed data. Overall, the findings demonstrate that structured behavioral priors combined with parameter-efficient LLM fine-tuning can effectively preserve global activity composition and dominant transition patterns within a metropolitan context. However, higher- order contextual fidelity, long-tail behavioral diversity, and temporal magnitude calibration remain areas requiring further refinement. This work contributes a multi-level evaluation framework for synthetic travel diary generation and establishes a region-specific foundation for future research in structured mobility data synthesis. Keywords: synthetic travel diaries, activity-based modeling, large language models, Markov chains, kernel density estimation, behavioral fidelity, Minneapolis–St. Paul.

Comments/Acknowledgements

The completion of this thesis represents not only an academic milestone but also a deeply personal journey of growth, persistence, and faith. I would first like to express my sincere appreciation to my thesis committee for their time, expertise, and thoughtful guidance throughout this research process. I am especially grateful to my committee chair, Dr. Jalal Khalil for steady leadership, constructive feedback, and encouragement at every stage of this work. The commitment and insight provided by Dr. Jalal Khalil significantly strengthened the quality and rigor of this study. I am also thankful to the faculty and mentors within the Computer Science Department for cultivating an environment that challenges students to think critically, innovate responsibly, and pursue excellence. The technical foundation and intellectual curiosity developed during my graduate studies have been instrumental in shaping this research. On a deeply personal level, I dedicate this achievement to my mother and wife. Their unwavering support, sacrifices, and constant belief in my potential have carried me through every challenge along this journey. Their strength and encouragement have been a source of resilience during the most demanding moments of this program. Above all, I give thanks to God for the wisdom, perseverance, and opportunity to pursue this path. Through every obstacle and breakthrough, faith has provided clarity, purpose, and the strength to continue forward. This thesis stands as a reflection of the collective support, mentorship, and love that made its completion possible.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.