I am looking for someone who will be able to scrap text from a list of websites that I will give you in excel file and apply some filters.
I will give you about 1000 websites.
the script i need will be able to classify the websites into different industries by using key words that i will specify for each industry.
The main objective of the script is to identify which industry the website belongs to.
a) I will give you a list of 10 real estate websites. Script will scan the websites test.
b) Key words ( must be able to apply (must include key words and must not include key words)
b1) Real Estate Key words filter = real estate, property, building, apartment, flats.
b2) Cleantech Key words filter = eco,clean energy, environmentally friendly, not soap, not washing
c) Apply filters
c1) Filter 1) Apply Industry = Real Estate if the Real Estate filter keys are found on the website.
c2) Filter 2) Apply Industry = Cleantech if the Cleantech key words are found on the website.
d) Out put must be to excel file with the original website url and the industry that has been analysed by the script by applying the filters.
Must be built in php, my sql