Two php scripts which scrape Television Series information the two websites below:
[login to view URL]
fields: (title, year, imdb website, description, categories (drama, comedy, etc..), actors, url of image. (The list has 334 titles)
[login to view URL]
You must traverse the tree, enter each series page to extract additional information.
fields to scrape: title, wikipedia page of base series e.g. /30_rock, , official website, imdb website, actors (the list has 217 series)
Each script must output to a separate csv file.
A third php script must merge the duplicate data: if two entries have the same imdb url, the wikipedia entry must be completed with the data from the imdb scrape.
Use php and curl with appropriate agents. No user interface, the scripts must be executed on command line, separately