We need to screen scrape some business info from a business directory. Its divided in to 20 categories and 270 subcategories. The categories are only two levels which means that there arent any sub-sub categories. Its a total of around 10000 entries.
We need to data inserted into a mysql database.
The info that should be collected is:
Ownership (Private or Public)
All the data is in a very simple format which looks like this:
<td class="entry"><p class="address">Axa Ltd</p><p>Box 43</p><p>IP123BL Suffolk</p><br /><strong>Private</strong></td><td class="phone"><p>031436700</p></td><td class="text"><p>Axa sells steel in the central London area.</p><p style="margin-top: 6px;"><a href="[url removed, login to view]" target="_blank">www.axasteel.co.uk</a></p></td>
I cannot give out the URL for the site because its not open to the public but its in standard directory format with a folder structure with categories/subcategories and the code looks like the example above.