We need to build keywords extraction system that have 4 different approaches to extract KW.
the main approach extract keywords from text document depending on the word senses, and it can have the following levels :
Convert all the letters in to low-case, remove any special simples.
2- stop-word remover: removing all the words that have no meaning
(ex: is, are, the)
that can be done by using stop-word list.
3-stemming: removing all the attached parts from the original words
(ex: removing--> remove)
that can be done by using porter algorithm.
tag all the words with its part of speech
fast--> fast -adj.)
•-Word Sense Disambiguation-WSD:
This problem need to be solve, to do that we can use WordNet dictionary.
We need to cluster the keywords base on the word senses, and I'm flexible in that step, any thing can do the task it's ok to be used.
We can use any statistical method to select the final KW from the clusters like RAK scoring.
The rest 3 approaches:
1. N-gram generation composed with keyword-selection by TF/IDF scoring.
2. N-gram generation composed with filtering by PoS pattern.
3. N-gram generation composed with keyword-selection by RAKE scoring
After that we need to evaluate the 4 approaches with each other and the first approach must show the best result .
The test data will be (20 newsgroup data set), not all the set, only 500 file will be good.
Also i need help in the documentation and paper works, and the work need to be done in 2 weeks .
13 freelancers are bidding on average $1625 for this job
hello, i am red hat certified engineer and i am more then 4 year experinence in this field in that same time i am expert in java and python ready to start now thanks
Hi, we are a staff of great developers. We read the project description and we can do it for you. For more information, please, contact me. Sincerely, William Granzotto.