Date of Award


Culminating Project Type


Degree Name

Applied Statistics: M.S.


Department of Mathematics and Statistics


College of Science and Engineering

First Advisor

Hui Xu

Second Advisor

David Robinson

Third Advisor

Richard Sundheim

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.


Model selection is a challenging issue in high dimensional statistical analysis, and many approaches have been proposed in recent years. In this thesis, we compare the performance of three penalized logistic regression approaches (Ridge, Lasso, and Elastic Net) and three information criteria (AIC, BIC, and EBIC) on binary response variable in high dimensional situation through extensive simulation study. The models are built and selected on the training datasets, and their performance are evaluated through AUC on the validation datasets. We also display the comparison results on two real datasets (Arcene Data and University Retention Data). The performance differences among those approaches are discussed at the end.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.