I have a set of 100,000 news articles in plain text files and I want to extract all articles that relate to one specific topic.
I will provide you with 300 manually chosen articles that are about the topic and I want you to write a small script that would extract all the articles that would likely be about the topic. You should use some sort of statistical method using a factor analysis and document classification models.
I would love it if the script is in python but any other scripting language is fine. You would have to submit the code.
If you are interested, PM and I will tell you what the topic is.