I think this would be ideal for someone with Ubot skills or similar. This is my first posting here so if I miss stuff or confuse sorry !
This is a task I am performing repetitvely so I want it automated. And I need a bit of flexibility. This is for extracting output from SERPS listings
Here is a video of what I am doing ! Which hopefully shows what I am trying to achieve here. As you can see its not always a contact page !. You may need to scrape the whole site. And as you will see its not always possible to view the page source which is a bonus (as mentioned below).
Video [url removed, login to view]
This is a keywords search using whatever search engine (see note at bottom) and keywords I want to use and extraction of data from sites in the SERPS listings
1. Be able to specify a single search engine (MUST)
2. Be able to specify keywords for search (MUST)
OUTPUT From analysis of results ie each site has to be analyzed for - (Google Places sites should ideally be analyzed as well !)
1. The site must have a phone number or email address (MUST have one or the other) ie if not on Home page typically go to contact page and get information or trawl all pages. Webmaster email addresses are no good ! OR If it is a a site with no email or phone and there is a form to complete then set a flag for this (If that's too complicated please state that). What I Am trying to avod is ADSENSE SITES so the site that is output MUST have a phone number,email address or a contact form
2. Display how the site is displayed in SERPS (the search engine listing) ie Title and Description (ideal ie not a must)
3, Display 'source' of home page (I use firefox as my browser and I'm not sure if you need a plugin for this but there are loads available. When you have clicked on the site you scroll down ie away from the header and then right click and you get an option from the browser for page source. It shows the html etc.. This is available in IE as well. It is important to scroll away from the header ! ) (ideal ie not a must)
4. If you can achieve 3 above (page source) and then search for wordpress within the output of the page source and then in the output data flag this is a wordpress site ie you found an occurence of wordpress GREAT !
So the IDEAL results will look like this
website phone number (if found), email (if found), contact form (if only a contact form), page source, wordpress flag
IF no phone number or email or contact form I don't want to know !. As stated above page source and wordpress flag would be a great bonus
Output can be EXL or text comma deliminated (EXL best) or if you have a different option let me know.
IF USING DIFFERENT SEARCH ENGINES IS A PROBLEM LETS JUST FOCUS ON [url removed, login to view],GOOGLE,[url removed, login to view] but I need that flexibility at least. Again please state what you can do.
Folks I will have more work (html) soon. Simple updates to clients sites. I would be interested in your hourly rates. Let me know if you are interested in that as well.