Date of Award
Culminating Project Type
Computer Science: M.S.
Computer Science and Information Technology
School of Science and Engineering
Dr. Maninder Singh
Dr. Andrew Anda
Dr. Aleksandar Tomovic
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Keywords and Subject Headings
Extractive Text Summarization, Text Summarization, Natural Language Processing, Machine Learning, Unsupervised Machine Learning
We routinely encounter too much information in the form of social media posts, blogs, news articles, research papers, and other formats. This represents an infeasible quantity of information to process, even for selecting a more manageable subset. The process of condensing a large amount of text data into a shorter form that still conveys the important ideas of the original document is text summarization. Text summarization is an active subfield of natural language processing. Extractive text summarization identifies and concatenates important sections of a document sections to form a shorter document that summarizes the contents of the original document. We discuss, implement, and compare several unsupervised machine learning algorithms including latent semantic analysis, latent dirichlet allocation, and k-means clustering. ROUGE-N metric was used to evaluate summaries generated by these machine learning algorithms. Summaries generated by using tf-idf as a feature extraction scheme and latent semantic analysis had the highest ROUGE-N scores. This computer-level assessment was validated using an empirical analysis survey.
Acharya, Swapnil, "Extractive Text Summarization Using Machine Learning" (2022). Culminating Projects in Computer Science and Information Technology. 39.