First of all: This should be programmed using ANSI C that compiles in GCC should be cross platform.
We need a Function that will take a web URL and download the pages html contents. (it should not download any pictures or any other external files) It should then come up with a title, description and keywords based on the meta tags. If ther are no meta tags, the title, keywords and descriptions should be be figured out like google or yahoo- in that it will ignore common words like 'a', 'the', and many others. It should also drop words that have been repeated to many times (more then 7 I think). It should also attempt to figure out the last time the page was modified - if it can't it should compare it with an internal date in the database- and store in the database only if newer. The URL, Title, Description and keywords should be saved in a database called "sites.dat" using a database function we have had developed for us.
At any point that it receives an error 301 (or any other redirect method) it should follow the link then update the URL that was passed in.
If there is a 404 or any other error preventing the page from being downloaded it should return all blank values.
Any links that it finds should be stored using a database function that we are having developed using the filename "links.dat".
This function should obey all ROBOT tags, as well as [url removed, login to view] files.
When this is being coded, you should be aware that not all sites have perfect HTML and some tags will be wrong or full of errors. Count on this function looking at badly formed html sites.
In most cases, this should act no differently as a googlebot. Though when downloading a page it should identify itself as 'dCrawler'.
Bu iş için 11 freelancer ortalamada $251 teklif veriyor
Hi, Crawling is our first choice. We have developed so many crawlers in PHP/MySQL and we are very much confident that we can develop a crawler in C/C++ also in GNU/Linux environment. For demo and discussion please se Daha Fazla
I already worked on a similar project. (downloading/smart parsing). I may have to tune my code, since it worked under windows and in C++. Still I put 10 days in order to have time to test the app completely & carefull Daha Fazla
Dear Sir/ Madam, If you are looking for top quality and quick turnaround then we will be delighted to work up the required downloader for you. We are an IT company specializing in web technologies and programming. Daha Fazla
We are Web development,Search Engine Optimisation and BPO company from India . Kindly go through our url [login to view URL] . We are interested in your project. Thanks. Regards, Anshu
Hi there, Niftysoft Solution is a leading IT services company providing solutions across the globe. A large team of extremely professionals staffs Niftysoft Solution with a strong background in IT field and having Daha Fazla
Dear sir, I will complete this program within 15 days to suit all your requirements. Thank you.
We are a group of software professionals from India with expertise in ASP, ASPx, HTML, XML, Java, C, C++, VB, Oracle, SQL Server, PHP, My SQL Professionals ranging from 1 yr to 20 yr of experience We are sure to Daha Fazla
Dear Sir/Madam We are group of software engineer having expertise in web technology, windows desktop application development, security and mobile technologies. Recently we have developed a project in which we are pars Daha Fazla