I need a program, in whatever programming language you want, that:
1. fetches all the sold completed listings resulting from specified categories/keywords in eBay + all the completed listings found in the feedback page(s) of the relative sellers.
Note: it must be able to fetch all the found listings (even thousands), not only the first page...
2. groups identical/similar pages (between those fetched in point 1)
Note 1: to detect the identical/similar pages you must use one of the following three open source 'similar detection' algorithms (or others better than these): "Sif Fingerprint", "Substring Fingerprint", "Levenshtein distance", with user configurable similarity threshold. You must IMPLEMENT ALL THESE THREE ALGORITHMS: I will select in the GUI the one of my preference.
I suggest you to copy the first two of the above cited algorithms (Sif and Subst. Fing.) from this good open source Java program that uses them: [url removed, login to view]
Note 2: the detection of the duplicates/similars must be operated only over a filtered part of the pages (e.g. title and description of the eBay item): the program must allow me to specify this filter in the form of (one or more) regexps.
3. for each so formed group, displays the number of sold items, the total earned in dollars and the average earned dollars for day.
Here is a short example of how this final result should be: [url removed, login to view]