We have a requirement for a web scraper to be written in PHP or Python. The database will be interfaced with our existing web software. The scraper is to be able to machine-read the product pages and prices from online retail websites in New Zealand, specifically Computer part retailers. Their prices will be listed in our software, directing them potential customers.
The scraper requirements are as follows:
- The scraper will initially be required to scrape the attached 120 stores.
- Stores are to be kept in a stores table (mysql structure supplied).
- Products that are read by the scraper are to be kept in a store_products table (mysql structure supplied).
- Stores will be scraped daily by the scraper, updating products when found, or adding them if they are new products. The [url removed, login to view] rules must be followed for the store.
- Product items that are found for a store should be unique (no duplicates).
- The 120 stores should be able to be scraped and update the store_products table in 9 hours or less.
- The ability to exclude URLs or keywords of products is required via mysql table of rules.
- In the future, extra stores can be added to the scraper. Future requirements will require scaling to 300+ stores.
- Daily output when the scraper has scraped all stores, with stats on the results from each stores scraping for the day (to be saved as a log file on the server).
- As we do not require any administration pages to be built, Delivered code must be structured and commented very well for future modifications.
Communication via skype or msn + email. Budget is negotiable.