The application will read a list of web page URLs from a file, go through each row by row, check that they contain a specified link within that web page, get rss feeds from them by means of the Feedage website (needs captcha bypass, for which I have an account at the two main services), then process these into a smaller number of rss feeds from the [url removed, login to view] site, submit these to rss aggregators, put these into a ping site to ping them. The application stops after x number of Feedage feeds have been processed, x being the number input by the user at the start.
The application or utility will emulate a human worker, but it will be much quicker and without the boredom!
This project is in 3 stages which I have divided into 3 milestones for the sake of clarity.
The user will input a file to the software.
Input file contains a list of web page address URLs in column A. The software will read the input file and check whether or not a link exists on that page (which is given in each case in column B). If the link is present the software writes 'Found' in column C. If the link is not found the software writes 'Not Found' in columns C.
The application will use a timer which pauses for a random number of seconds between each row.
The software will continue until there is no more data in column A.
The software will be capable of processing up to 4 files at the same time, and will output the completed file with the results.
The software will report on the progress that is being made in each file in real time.
As above, except that instead of the software writing 'Found' and 'Not Found' the software will, for all web pages on which links are found, go to the Feedage website and get an RSS feed (entering a capture is necessary for this: I have an account with Decapture and Death by Captcha). The software will write the URL of this feed into column C.
The software will get x number of RSS feeds from Feedage, where x is a number which is input by the user when the file is input, prior to the running of the process.
If there is nothing in column B then the utility should get the RSS feed for the web page in column A anyway.
As above, and additionally, every 20 Feedage feeds will be put into [url removed, login to view] and turned into a Bulkping feed; this shall be written in column D.
This feed is then submitted to the RSS aggregator by using [url removed, login to view] (all inputs in that process will be supplied on the input file).
This process will continue until the number of Feedage feeds is equal to x. If there are any Feedage feeds left over they, too, are put into [url removed, login to view] and an additional Bulkping feed derived.
For all web page URLs for which links are found, all these are collected into a list and put into [url removed, login to view] for mass pinging by that system. The Feedage feeds are also put into [url removed, login to view]
Finally, all Bulkping RSS feeds are added together, and the website feed's own RSS feed (if supplied) and put into [url removed, login to view] to produce a 'master Bulkping RSS feed'. This is written in Column E at the foot of the file.
The Bulkping RSS feeds and the master Bulkping RSS feed is also put into [url removed, login to view] for pinging.
The process will stop.
[url removed, login to view] was considered for joining up of rss feeds (instead of [url removed, login to view]), but has proved unstable. This, however, may change. If the coder knows of any better such service then feel free to make suggestions.
On awarding the project I can give the winner additional illustrative help.