Date of Award
12-2014
Culminating Project Type
Thesis
Degree Name
Computer Science: M.S.
Department
Computer Science and Information Technology
College
School of Science and Engineering
First Advisor
Bryant Julstrom
Second Advisor
Andrew Anda
Third Advisor
Brian Reese
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Keywords and Subject Headings
natural language processing, computational linguistics, verb-particle constructions, corpus generator, colloquial syntax, lyrics
Abstract
Song lyric corpora - collections of text for use in linguistic analysis-have proven to be publicly unavailable despite many previous studies on the subject. We designed and created two programs which generate a song lyric corpus. These programs can be shared or distributed, and they avoid potential copyright issues while also allowing future researchers to generate corpora of popular song lyrics which contain songs as current as the previous week. We used these programs to create a corpus of approximately 800 modem US popular songs, the Song Lyric Corpus.
Verb-particle constructions, a type of multiword expression, often cause difficulties for natural language processing tasks because of their colloquial nature and variable syntax. We hypothesized that there would be a higher density of verb-particle constructions in song lyrics than in formal speech because song lyrics contain more colloquial speech.
We selected two corpora to perform a comparison of verb-particle construction occurrences. The Song Lyric Corpus generated during this research and the Editorials section of the Brown Corpus were tokenized and part-of-speech tagged using Python's Natural Language Processing Toolkit (Bird, Loper, & Klein, 2009). We then manually examined the resulting tagged corpora to identify verb-particle constructions.
The Song Lyric Corpus had more than five times as many verb-particle constructions as the more formal Brown Corpus, indicating that song lyrics are a promising future medium in which to study verb-particle constructions.
Recommended Citation
Thrall, Mary, "Creating a Song Lyric Corpus Generator and Identifying Verb-Particle Constructions In Informal Language" (2014). Culminating Projects in Computer Science and Information Technology. 46.
https://repository.stcloudstate.edu/csit_etds/46
Comments/Acknowledgements
I would like to thank my committee members, Professors Bryant Julstrom, Andrew Anda, and Brian Reese. The Computational Linguistics Reading Group at the University of Minnesota offered helpful advice and kept me on task. This thesis would not have been possible without the help and constant encouragement of fellow graduate student Aleksandar Tomovic throughout the past two years. My deepest gratitude goes to all.