We're looking for a customizable distributed crawler to be deployed on services such as Amazon EC2 or RackSpace to crawl targeted sites such as Yelp or CitySearch. The system should also have an automated mechanism to extract parts of the HTML automatically and save them into an XML format. The crawler & the extractor need to be fast and efficient and preferably offer a GUI-based config tool (but a text file might be OK as well).
Do send us sample code and/or sample results of your code!
Also, we are running our back-end on LAMP so we need your project to run in our environment (e.g. no .NET stuff pliz!)
Looking forward to getting this project done as soon as possible!
15 freelancers are bidding on average $1193 for this job
Hi there, I have already done such a job and create crawlers for yelp, citysearch, zagat, tripadvisor, epinions, yellowpagescity etc. I can do this job efficiently. Thanks for you interest.