Basic extraction from Wikipedia (from a few specific lists to DB)

Kapalı İlan edilme: Mar 22, 2011 Teslim sırasında ödenir
Kapalı Teslim sırasında ödenir

===================

BACKGROUND

===================

I will provide you with a few lists from Wikipedia website (list of ballet companies, list of operas, list of musicals, etc.) and your job would be to write a script to extract details into two basic mySQL tables (I will provide the structure of the two tables below).

As part of the deliverables of this project, I'm looking for (a) populated tables with data and (b) the scripts themselves which were used to extract the data.

**This is the first trial project of any such extraction undertakings. There is more extraction work ahead.**

===================

DATA STRUCTURE

===================

There will be two tables: "entities" table and "entity_names" table:

**entities** table:

- ID

- Wikipedia_Page

- Type

- Primary name ID (which will point to "ID" from "entity_names" table)

**entity_names** table:

- ID

- entity_ID (which will point to "ID" from "entity" table)

- Name

- Type (primary or secondary)

The reason we're using two tables, is that a given entity could later have more than one name/alias (for example "San Francisco Symphony" could be called "SF Symphony"). For all the stuff you will be extracting, you can set the value of "type" field of "entities_table" to "primary".

## Deliverables

===================

WHAT TO EXTRACT

===================

1) List of all ballet companies

Source: <[login to view URL]>

Fields to grab:

Name = "Company Name" from the table

Type = ballet_company

Wikipedia page = page for each ballet company (example: [login to view URL])

2) List of Operas

Source: <[login to view URL]>

Name = opera name from the list

Type: opera

Wikipedia page = page for each opera (example: [login to view URL])

*(below, I will only provide the type as the other fields are self-explanatory based on the above two examples)

*3) List of Opera Companies

Source: [[login to view URL]

][1] Type: opera_company

4) List of Musicals:

Sources: <[login to view URL]:_A_to_L>

<[login to view URL]:_M_to_Z>

Type: musical

5) List of Orchestras:

Source: <[login to view URL]>

Type: orchestra

6) List of Improv Theater Companies

Source: <[login to view URL]>

Type: improv_theater_company

7) List of Comedians

Source: <[login to view URL]>

Type: comedian

Note: Please only extract those who are still alive (i.e. do not take someone like "Bud Abbott (1895-1974)")

8) List of Stand-up Comedians

Source: [[login to view URL]

][2] Type: stand_up_comedian

Note: Please only extract those who are still alive

9) List of dance companies:

Source: <[login to view URL]>

Type: dance_company

10) List of pop punk bands

Source: [[login to view URL]

][3] Type: pop_punk_band

Java JavaScript MySQL PHP Betik Yükleme Kabuk Betiği Yazılım Mimarisi Yazılım Test Etme Web Hosting Web Sitesi Yönetimi Web Sitesi Testi XML XSLT

Proje NO: #3191040

Proje hakkında

28 teklif Uzak proje Aktif Apr 13, 2011

Bu iş için 28 freelancer ortalamada $177 teklif veriyor

repmovsd

See private message.

$382.5 USD in 5 gün içinde
(144 Değerlendirme)
7.0
samirkumardas

See private message.

$297.5 USD in 5 gün içinde
(241 Değerlendirme)
7.0
sktn

See private message.

$143.65 USD in 5 gün içinde
(262 Değerlendirme)
7.1
pbradaric

See private message.

$85 USD in 5 gün içinde
(28 Değerlendirme)
6.1
mastirlaa

See private message.

$85 USD in 5 gün içinde
(76 Değerlendirme)
6.1
novepi

See private message.

$212.5 USD in 5 gün içinde
(42 Değerlendirme)
5.9
Bitquark

See private message.

$170 USD in 5 gün içinde
(44 Değerlendirme)
5.9
tomkusvw

See private message.

$85 USD in 5 gün içinde
(62 Değerlendirme)
5.7
webspiderinc

See private message.

$85 USD in 5 gün içinde
(53 Değerlendirme)
5.5
topleaseu

See private message.

$212.5 USD in 5 gün içinde
(24 Değerlendirme)
5.3
oasis21

See private message.

$127.5 USD in 5 gün içinde
(35 Değerlendirme)
4.9
szaszalexmcpd

See private message.

$85 USD in 5 gün içinde
(55 Değerlendirme)
4.4
lenzai

See private message.

$340 USD in 5 gün içinde
(16 Değerlendirme)
4.2
ragastens

See private message.

$110.5 USD in 5 gün içinde
(37 Değerlendirme)
4.4
cwaldbieser

See private message.

$297.5 USD in 5 gün içinde
(10 Değerlendirme)
4.3
powzak

See private message.

$85 USD in 5 gün içinde
(25 Değerlendirme)
4.1
MrRain

See private message.

$85 USD in 5 gün içinde
(13 Değerlendirme)
3.8
rased108

See private message.

$85 USD in 5 gün içinde
(29 Değerlendirme)
4.6
Archit88

See private message.

$136 USD in 5 gün içinde
(14 Değerlendirme)
3.3
ifailed

See private message.

$85 USD in 5 gün içinde
(8 Değerlendirme)
2.4