I would like to make some additions to the link mailer application you built for me. Specifically:
- Add some user management so that I can create accounts for users which they can access to manage their own projects. This functionality would give me, the superuser, the ability to add accounts (username/password) and define the number of projects and email accounts each of those are able to add, and a link limit per project, eg. a new account can only set up 5 projects and 5 emails acocunts and have a maximum of 500 links per project (which I can set from my user admin).
- Add a function for the system to check for URLs in the index. The way I imagine this working:
- System will time stamp when the status of a link changes from posted to crawled.
- A cron job will run daily which will look for links with status "crawled" and a timestamp of more than X Hours old (configurable and set to 576 to start)
- For those links it will do a site:<url> search on google to see if that URL is in the index. If it is then it will update the status to "Indexed"
- If the page is not in the index then it will leave the status as is and record the time of the index check.
- For sites that are not found in the index the system will check again X hours later (same configurable number as above), but the number of checks should be limited to 3 (configurable) after which it will not check anymore.
- The stats page then will reflect an additional number of Indexed links and this will be reflected on the chart also. Overtime of course we want to see that the % of indexed links is (hopefully) very high.
I am not sure how best to run a high number of "site:<url>" queries on Google. I need your help to figure this out. I like the idea of using the Google API if possible: [url removed, login to view] - I am prepared to sign up for Google Billing if we need to do more than 100 queries per day on the API. I think that should be enough.
If we need to do some scraping to get this data then I do have a few private proxies which might be useful.
I think that covers it, though I image you will have some questions.
If you are not interested in this project please let me know so I can invite other programmers.