This is a similar script as the one I request in my previously posted project 91169.
The script should perform a search on Google, Yahoo and MSN for keywords which I specify, grab some information from the search results and save it in a text file or database. The script should run on the the webserver (shared webhosting service) which my website is hosted on. The server has Perl [url removed, login to view], PHP and MySQL database installed.
There are free solutions for this available on the internet but I need some additional features, especially the ability to input a list of keywords and get the results in a file.
It will be used in an adademic research project for acquiring a PhD. It might lead to additional projects for the bidder, either for my research or in a project for a company.
The script should function as follows:
- First I go to a simple website (to be programmed) which contains a text input field. Into the field I input a list of keywords:
- After clicking on the send button, the list should be saved in a text file (input file) on the server and initiate the actual script.
- The script takes keyword1 (or keyphrase1) and site1 from the input file, goes to Google and submits a search with the keyword.
- It waits for the search result list, grabs the first 10 search results (only the URLs) and saves them in an output file.
- Inside the search results the script searches for site1 and write the position of site1 into the output file, too. In case site1 is not on the first result page, the script goes to the next result pages and searches up to the x-th position (x to be defined) for site1. Note: site1 is a website, not a specific URL. Therefore, [url removed, login to view] satisfies the search for site1.
- The script performs the same action for keyword1 and site1 on Yahoo and MSN.
- The script continues with keyword2 and site2 from the input list and performs the same steps as described, adding the new information to the same output file. It continues doing so until all keywords are finished.
The output file will look as follows:
KEYWORD, URL, ENGINE, POSITION, RESULT1, RESULT2, ..., RESULT10
keyword1, url1, Google, 3, [url removed, login to view], [url removed, login to view], url1, [url removed, login to view], ...
keyword1, url1, Yahoo, 1, url1, [url removed, login to view], [url removed, login to view], [url removed, login to view], ...
keyword1, url1, MSN, 5, [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view], ...
keyword2, url2, Google, 21, [url removed, login to view], [url removed, login to view], [url removed, login to view], [url removed, login to view], ...
The list of keywords could be as long as a few hundred keywords. It will take the script up to a few hours to complete the job. It must be programmed to run independently and stable, and be able to handle slowly responding search engines.
The script will be used for the Japanese search engines. Therefore it must be able to handle Japanese Characters, both in the input and output. I need to be able to change the URLs of the search engines if necessary. It would be good to have a parameter file for this or to define the search engine URLs as variables in the script so I can easily change them. The search engines would be the same, but I might change the domain ending or the parameter which follow the search engine URL (search location, language options etc.).
Optional feature (not necessary but useful, would pay extra):
- Schedule to run the script regularly, e.g. once a day, using cronjobs. The output files would have to be name incrementally, including a datetime stamp. Another solution would be to save the output in a database on the server.
- It would be a bonus if the solution will be able to handle multiple instances of the script, either at the same time or after each other. This way I could submit more than one input file at a time and wait for the results.
- The output file would be downloaded manually from the shared server but it would be a bonus if the script would send the output file to a prefixed email address after finishing the job.
Please describe briefly which solution you can provide, e.g. programming language, etc. and at which price you would realize the optional features. Perl seems to be appropriate to me but in case you have other suggestions, please feel free to describe them. The solution should not take more than three weeks (not including the optional features), faster completion is welcome.
Please contact me if you have any more questions or ideas.