I need someone to help me crawl a manufacturer’s website and extract product model and part numbers and their related descriptions into Excel/CSV format.
(I'm reposting this project due to the overwhelming number of automated responses last time, which just wastes everyone's time. Please quote the word "mastiff" in your reply to let me know that you have read the brief in person. Thank you)
The web pages / links are not static but rather Java/AJAX based, which is where the off-the-shelf packages I've tried to use are falling down.
I have two particular websites in mind initially but, if successful, there will be more.
The data will be in a similar format each time, although i) the number of columns in the tables from which it is to be extracted can be in a couple of different styles ii) the number of links that needs to be followed to get to each data table from the start point varies Iii) some part/model numbers in the data tables are themselves links, leading to further data tables
Data to be captured :
Link text at each link clicked on the way to each data table
Data table sub-headings
Part / Product / Model Number
The nature of the sites makes it difficult to estimate the total number of parts, but I would guess 100k+
It is important that all links are crawled and scraped and none are missed.
Please contact for further details / links / screenshots