The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award


Culminating Project Type


Degree Name

Computer Science: M.S.


Computer Science and Information Technology


School of Science and Engineering

First Advisor

Bryant Julstrom

Second Advisor

Andrew Anda

Third Advisor

Brian Reese

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Keywords and Subject Headings

natural language processing, computational linguistics, verb-particle constructions, corpus generator, colloquial syntax, lyrics


Song lyric corpora - collections of text for use in linguistic analysis-have proven to be publicly unavailable despite many previous studies on the subject. We designed and created two programs which generate a song lyric corpus. These programs can be shared or distributed, and they avoid potential copyright issues while also allowing future researchers to generate corpora of popular song lyrics which contain songs as current as the previous week. We used these programs to create a corpus of approximately 800 modem US popular songs, the Song Lyric Corpus.

Verb-particle constructions, a type of multiword expression, often cause difficulties for natural language processing tasks because of their colloquial nature and variable syntax. We hypothesized that there would be a higher density of verb-particle constructions in song lyrics than in formal speech because song lyrics contain more colloquial speech.

We selected two corpora to perform a comparison of verb-particle construction occurrences. The Song Lyric Corpus generated during this research and the Editorials section of the Brown Corpus were tokenized and part-of-speech tagged using Python's Natural Language Processing Toolkit (Bird, Loper, & Klein, 2009). We then manually examined the resulting tagged corpora to identify verb-particle constructions.

The Song Lyric Corpus had more than five times as many verb-particle constructions as the more formal Brown Corpus, indicating that song lyrics are a promising future medium in which to study verb-particle constructions.


I would like to thank my committee members, Professors Bryant Julstrom, Andrew Anda, and Brian Reese. The Computational Linguistics Reading Group at the University of Minnesota offered helpful advice and kept me on task. This thesis would not have been possible without the help and constant encouragement of fellow graduate student Aleksandar Tomovic throughout the past two years. My deepest gratitude goes to all.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.