We are looking for a crawler to crawl every page of a website looking for external links pointing to expired domains.
User should definde a list of sites to crawl via text file. Crawler should work logically crawling all pages of a site and not be sitemap dependent. Only unique external domains should be logged to prevent duplicate domain availability lookups.
User should also be able to define a list of urls to ignore checking for availability; eg. [login to view URL] etc. these domains should be user defined in a blacklist text file.
Results should be given in a csv file listing linking domain and available domain.