I am looking for a developer with strong PHP and MySQL skills to develop a web crawler and data extraction spider.
The spider needs to perform these functions:
1) Search for all sites in a particular country that meet the specific subject search terms. This could be achieved by querying Google to get a list of sites to crawl. This will then create a list of sites to crawl regularly.
2) Crawl the list of sites from step 1 and search for a specific type of item on the crawled web sites in the list.
3) If the item type is found on the web site then extract the data from the web pages in as clean a way as possible from between the &lt;body&gt;&lt;/body&gt; tags. This process will remove as many HTML, CSS and other tags as possible to acquire data that is relevent and as free from distracting tags as possible.
4) Write the extracted content to a text field in the MySQL (version 4.1+) database.
All code to be PHP 4+ and PHP 5+ compatible and to run from a powerful Linux server.
There will be other projects for the developer with the right skill sets, experience and sheer ability to deliver high quality systems at a reasonable price.