I need a code (in java, .net or php) that will choose random 10.000 URL-s from CommonCrawl dataset ([url removed, login to view]). For each URL you need to extract:
1) page title of that page (from <title> tag)
2) most frequent anchor text used to link to that URL - excluding one word anchor text and excluding URL anchor texts (anchor texts with http:// and www)
The results should be exported in excel or csv file. The file will have these columns:
URL, TITLE, ANCHOR TEXT.
For step 2) you will probably need to use external API like [url removed, login to view], [url removed, login to view] or similar. The 1 monthly cost of these api will be paid by me.
Hi, I am expert crawler maker. So this project wont be any problem for me. I will use [url removed, login to view] for api. And I will use .net for codding. Thanks
10 freelancers are bidding on average $162 for this job
Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of my similar experience: Amazon/Ebay Bots: [url removed, login to view] Daha Fazla
Hi Can be done. You need 10000 random pages from public dataset, and not care which pages? Also, how many anchor texts you need? One most popular or more? As I check, 5 most popular availiable for free on [url removed, login to view] Daha Fazla
Hi, Please feel free to discuss the project with me ........................................................................... Thanks, Murtaza
Hi, I am Saravanan. I have 7 years’ hands on experience on Web /Desktop Application Development, Automation/Scraping and Testing using Java Technologies. I went through your requirements, I am intrested to work on th Daha Fazla
Hi! I am interested in your project. I am working in same projects (web spider) so I strongly believe that my abilities fit to your requirements. I look forward to working with you!
I've done similar developments, I would soon have results. I'm used to comply with dates I have long worked as a developer for companies.
Good day! For your specific project I would be perfect. Have a look at my portfolio! I have extensive experience in Java programming (over 6 years) and have worked with databases multiple times - especially in the Daha Fazla