We need someone to create a script that will extract data from a few search and shopping engines for us. We need to scrape the top engines that have paid listings on a set of keywords in order to gather certain information. The same process will be used to gather this information from the shopping engines. This information collected will be extracted into a database. Project outline described briefly below.
A brief description of the requirements are as follow:
We need a program that can scrape the following information (title, description, URL, and placement (1st , 2nd, etc) for paid or sponsored listings from 5 different search engines, and 3 shopping engines. The program will grab the keywords that will be searched from a database or excel file. The program needs to multi-tread the queries to be efficient. The program will run (pulling from a list of keywords) until all the keywords have been exhausted. Keywords may be batched; IE 5000 keywords per day or something like that. The program needs to handle at least 5000 keywords per day. When complete with the entire list, it will start over from the beginning.
The results from the data collection / scrapping will be added to a database. The results from the scraping will need to be processed by the database.
I have attached an excell files(named [url removed, login to view]) which will show you what the output will include.
A single URL might have thousands of keywords. Along with the associated titles, descriptions, and placement.. We will need some reporting created to properly analyze the data.
We a online advertiser with thousands of keywords (keywords change and are added to daily). We want to see what other advertisers in our pace are doing, in order to be competitive.
This project needs to be completed ASAP
12 freelancers are bidding on average $205 for this job
I have already created a content harvestor kind of a program and is very much succeeded using perl. I think it will not take a lot of time for me and can meet the expected in this schedule.