The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award


Culminating Project Type

Starred Paper



Degree Name

Computer Science: M.S.


Computer Science and Information Technology


School of Science and Engineering

First Advisor

Dr. Maninder Singh

Second Advisor

Dr. Andrew Anda

Third Advisor

Dr. Aleksandar Tomovic

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Keywords and Subject Headings

Extractive Text Summarization, Text Summarization, Natural Language Processing, Machine Learning, Unsupervised Machine Learning


We routinely encounter too much information in the form of social media posts, blogs, news articles, research papers, and other formats. This represents an infeasible quantity of information to process, even for selecting a more manageable subset. The process of condensing a large amount of text data into a shorter form that still conveys the important ideas of the original document is text summarization. Text summarization is an active subfield of natural language processing. Extractive text summarization identifies and concatenates important sections of a document sections to form a shorter document that summarizes the contents of the original document. We discuss, implement, and compare several unsupervised machine learning algorithms including latent semantic analysis, latent dirichlet allocation, and k-means clustering. ROUGE-N metric was used to evaluate summaries generated by these machine learning algorithms. Summaries generated by using tf-idf as a feature extraction scheme and latent semantic analysis had the highest ROUGE-N scores. This computer-level assessment was validated using an empirical analysis survey.


I would like to sincerely thank my advisor Dr. Maninder Singh, Department of Computer Science, and Information Technology (CSIT) for his suggestions and continuous mentorship throughout the research and execution of its implementation. I would also like to express my gratitude to my committee member, Dr. Andrew Anda, Department of CSIT for his dedicated help in elevating several aspects of documentation of this research.

I would also like to thank my committee member Dr. Aleksandar Tomovic, Department of CSIT for his constructive feedback and support throughout the research. I would also like to thank Mr. Clifford Moran, Department of CSIT for his continuous and timely help in setting up classes, registration, and scheduling research presentations.

I would also like to thank all the members of the CSIT faculty at St. Cloud State University whose knowledge and expertise helped me to shape my academic career. I am very grateful for all my friends who helped me during the empirical analysis stage of this research. I would like to thank my family for their eternal support.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.