The program will take a corpus (around 2 million words of text), create an index of the text in a Mysql database. A web user will be able to interact with the database by entering a search word or phrase (text or regular expression format). The search word/phrase will be retrieved from the corpus (either every instance or even nth instance up to a maximum number that I can set). The results will be displayed as a concordance with a specified number of words to the left and to the right.
The results can be manipulated further by sorting the lines -- a primary sort and secondary sort based on the alphabetical order of the search word/phrase, word-first-left, word-second-left, word-first-right or word-second-right.
The user can also access a word frequency table that displaysin descending order the frequency of the words in positions word-second-left, word-first-left, word-first-right, word-second-right.
The critical part of the project is the creation of a searchable index of the corpus.