The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

Date of Award

5-2006

Culminating Project Type

Thesis

Degree Name

Computer Science: M.S.

Department

Computer Science and Information Technology

College

School of Science and Engineering

First Advisor

Ramnath Sarnath

Second Advisor

Bryant Julstrom

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Keywords and Subject Headings

LCS, GA, Genetic Algorithm, Longest Common Subsequence, Computer Science, Technology

Abstract

The Longest Common Subsequence problem (LCS) is a known NP-complete problem that computes the longest subsequence (series of characters occurring in the same order, although not necessarily consecutively) that any number of strings share. An LCS is not necessarily unique for any combination of strings; however, the length will be. The computationally difficult version of this problem occurs when the number of strings and the LCS length are not fixed. The problem has a number of applications: anything from searching content to file difference listings. There is no single solution that fits all situations, and the deterministic solutions available are written for a set number of strings and/or fixed string lengths. There is still a considerable amount of active research in this area.

For this project, a genetic algorithm (GA) was developed to find the LCS. The performance was compared to the dynamic programming algorithm (DP A) using 84 test instances. The test instances consisted of three strings of lengths 100,200,400, 800, 1600, 3200, and 6400 with a known LCS of length 10%, 50%, 90%, and 100% of the string length for a total of 28 instances. All 28 of these instances were created for three types of strings: a binary alphabet (III = 2), a "DNA" alphabet (III = 4), and an English-alphabet (III = 26). The DPA always finds the LCS. The GA was set up to run until the best solution was the length of the known LCS. The algorithms were compared based on CPU time to find the LCS. Since the GA is not deterministic, it was run 30 times for a test instance. The best and mean times were measured, along with the standard deviation in the test run times.

The GA performed nearly as well as the DPA on shorter instances (length up to 400). For strings of length 800, the time required for the DPA increased dramatically and was considerably longer than the GA time. The DP A failed to run for strings longer than 800. The GA time increased much more slowly than the DPA as the string length grew. The times for the longest strings were still reasonable. The GA time was not affected by alphabet size. Other than the test instances where the LCS was 100% of the string length, the GA time increased as the length of the LCS increased. When the LCS is 100% of the string length, there are no poor starting solutions, and the algorithm only needs to grow the length of the solution. The GA was also run on a test instance containing four strings of different sizes.

In addition, the GA returns the LCS string along with the length. The DP A needs a second trip through its storage structure to extract the LCS string. The GA can handle any number of strings, and any string length. The set of strings do not need to be the same length, since the program bases its solution size on the shortest string in the set.

Comments/Acknowledgements

My deepest gratitude goes to Tom Weitzel and Alex Milowski for pushing me back into school. They coached me throughout the entire process as well. I am also grateful to my husband, Keith E. Hinkemeyer, and my children, David and Thomas, for their support; in particular, for the opportunity to pursue this degree full-time. I would like to acknowledge the encouragement and support I received from Dr. Bryant Julstrom and Dr. Jayantha Herath at St. Cloud State University. I also cannot forget Meghan, Kyra, Panda, Rye, Riika, Crazy, and Denali who put up with lack of attention the past couple of years, and dogs love to get attention.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.