There are two Wikipedia category pages
1) [login to view URL]:All_NPOV_disputes
2) [login to view URL]:Good_articles/all
I need a python script that will
1) extract ALL the Wikipedia pages linked to from the 1st page (in the "Pages in Category "All NPOV Disputes" section) and
2) extract RANDOM 5000 (default setting) Wikipedia pages linked to from the 2nd page ("good articles" — from randomly chosen categories),
Convert them to the two XML files where
a) one file contains the actual articles (with an id starting from 0000000 to 0006000), the url, and the full text — like in the example upload articles-trained-byarticle.
b) the other file contains the id, the url, and the npov score, which equals NPOV = true for the articles imported from Category:All_NPOV_disputes and NPOV = false for the articles imported from Wikipedia:Good_articles/all
The script should have additional settings (initialized in the jupyter notebook when calling the script) that
1) can specify the range of the size of the text to be imported (e.g. default 0 to 10000 Kb)
2) can specify the type of articles to be imported (an array of Wikipedia page categories accepted, e.g. "Biographies", default = all)
3) can specify which source to use for NPOV = true and which source to use for NPOV = false (default settings - above)
4) can specify how many pages to be imported from each page(default: 5000, 5000)
note: the NPOV page is paginated, so you'll have to take this into account
The script should run in a Jupyter Notebook and have clear instructions for installing all the dependencies through anaconda or pip.
1) The script as above with all the settings
2) The processed dataset with the default settings above (that is, 2 XML files with extracted articles and NPOV score)
Bu iş için 19 freelancer ortalamada €191 teklif veriyor
Bonjour ! I can make you Python script that will extract wiki pages into xml files according to your requirements. If interested - I can make you a sample output files, so you can be sure that I am able to do that job.
Greetings! I hope you are doing great. I am highly professional in managing script writing projects. Please contact so I may assist you. Samples available upon request. Thank You, Revival
Hello I am a python developer with experience scrapping data from wikipedia with beautiful soup, I can do this in a week for 200 eur, talk to me in chat for more details.