İptal Edildi

Web Crawler Identifying Layout

I am looking for a solid web crawler, that has one task, and one task only...

Identify different page layouts on a site.

Some site, especially webshops have category pages, subcategory pages, product pages, checkout pages...

This crawler, should not identify the purpose of the page, but be able to take a site with 500.000 pages, and identify how many different page layouts there are.

In the end, it should end up making a list of each url, and add a layout ID (XML)

EXAMPLE XML

<website>
<ws_info>
<ws_url>http://domain.com/</ws_url>
<ws_pages>146.000</ws_pages>
<ws_cats>6</ws_cats>
<ws_scraped>01.07.2010 11:53:07</ws_scraped>
</ws_info>

<cpage>
<cpage_scraped>
<cpage_url>http://domain.com/some-page-url</cpage_url>
<ws_cat>3</ws_cat>
<cpage_scraped>

<cpage_scraped>
<cpage_url>http://domain.com/some-page-url</cpage_url>
<ws_cat>6</ws_cat>
<cpage_scraped>

</cpage>
</website>

Performance and speed of the scraper - as well as how it will intelligently view one page appart from the other is a main ingredient of this scraper.

Some sites have very similar pages, however making the scraper identify an element as a menu, submenu or navigation - thereby making it ignore the element is very much wanted...

I dont want to scrape a site with 200.000 pages, and the scraper comes up with 110.000 different category's of pages.

Beceriler: Algoritma, C# Programlama, Java, Makine Öğrenimi, PHP

Daha fazlasını görün: web-crawler, layouts for web pages, web crawler layout, identify web crawler, web page layouts, xml layout, web making, identifying, id layout, crawler, subcategory, web category, php crawler product, category layout, web task php, category list layout, 500 web list, category subcategory java, category subcategory list java, checkout page layout, category subcategory list, crawler php xml, java category subcategory, product crawler java, web crawler java xml

İşveren Hakkında:
( 0 değerlendirme ) Brønshøj, Denmark

Proje NO: #731731

7 freelancer bu iş için ortalamada 427$ teklif veriyor

aspnetexpert

please see pmb

in 15 gün içinde400$ USD
(5 Değerlendirme)
3.4
SPDotNetDev

I have been working as a .net developer for last six years. I also have experience on sharepoint. I think i suit well for this work. my core skillset includes: C#, SQL Server, .Net framework, Sharepoint and html.

in 10 gün içinde450$ USD
(0 Değerlendirme)
0.0
ehtashamulhaq

Let me help you out in this task. I done similar kind of task in a semester project of mine BS(CS) degree.

in 10 gün içinde350$ USD
(0 Değerlendirme)
0.0
jacklee2000

I have MS in CS and 10 years working experience in web and search engine fields, I am experienced in web crawler development.

in 9 gün içinde340$ USD
(0 Değerlendirme)
0.0
SCAnalytics

We are a team of .Net experts. We can do this project for you.

in 7 gün içinde300$ USD
(1 Değerlendirme)
0.0
UpiterSoft

Hello, I have experience in web page analysis and I can do it good. Please see PM

in 14 gün içinde750$ USD
(0 Değerlendirme)
0.0
codejam212

Hi, We are the group of people working from both India and US with knowledge in PHP, C#, ASP.NET, Data processing, Sql Server, MSSql, DB2, Joomla, Drupal did several projects as the same and we are really interested in Daha fazlası

in 10 gün içinde400$ USD
(0 Değerlendirme)
0.0