Devam Ediyor

Need web scraping script written for Web-Harvest

The job is to supply a script written with Web-Harvest 2.0 beta 1 to extract data from a website as detailed below. I will be running the script myself so the actual script is required and not the resultant XML.

The starting URL is [url removed, login to view]

Inside <table class="foodhome"> each <h2> must be saved as a category

Below each <h2> is a <ul> which contains a list of <li> which must be saved as a subcategory

Each <li> must be followed in turn

A link with the text 'Show all brands' must be followed

On the resultant page is a <table> with no classes or styles associated with it

Within that <table> are a series of <href> the the of which must be saved as a brand

Each <href> must be followed in turn

On the resultant page is a <table width="100%">

Within that <table width="100%"> area series of <tr>

Each odd <tr> must be saved as a subbrand

Each even <tr> contains a list of <li> which must be saved as a food

Each <li> must be followed in turn

On the resultant page is a <table class="foodlabel"> which contains the primary information I need

The second row contains the serving information

The various <options> must be saved as a serving (comma separated if possible)

The input id="amount" should then be set to 100

The selector should then be set to value=-1

If a value of -1 is not available then this row can be deleted / ignored and move onto the next

The third row contains two data entries

- <SPAN id="calories">xxx</SPAN> must be saved as a calorie

- <SPAN id="kilojoules">xxx</SPAN> must be saved as a kilojoule

- Please note that sometimes aother row is inserted between the second and third and should be ignored

The row following the calorie / kilojoule row can be ignored

Each row after this point must be saved in the following way

- The 1st column in the row is to become a field name (e.g. Total Fat)

- The 2nd column in the row is to become the value stored in the associated field name

- The 3rd column can be ignored

The last row can be ignored

That ends the data extraction for a single food item. The process should then move back and continue onto the next food item.

A single entry would therefore contain something like this

- Category

- Subcategory

- Brand

- Subbrand

- Food

- Servings

- Size

- Measure

- Calories

- Kilojoules

* Below this point the fields may vary depending n food item. Unused fields should should be left BLANK

- Total Fat

- Protein



- Feel free to bid if you can supply an alternative script for an alternative piece of software

- Please do not bid unless you are sure you can successfully perform the task as described above

- Time frame is not urgent but please stay in regular contact with brief progress reports


- I am happy to pay up to 33% on awarding the project

- A further 33% upon successfully demonstrating a working script by supplying a sample set of records

- The remainder upon providing the script and testing by me having been done to prove that it works

Beceriler: Veri Girişi, Veri İşleme, Web Scraping

Daha fazlasını görün: web harvest, web scraping script, web scraping odd, script web harvest, web harvest table, www job com au, web scraping software free, web scraping process, web scraping free, web page styles, value options, ul com, sure web, software scraping, scraping website software free, scraping free, scraping data from the web, scraping com, need supply,, li-be, job bidding script, free data input software, free data entry testing, free data entry classes

İşveren Hakkında:
( 3 değerlendirme ) Adelaide, Australia

Proje NO: #707190