Devam Ediyor

Fix a web scraper program created in Python

We have a Web Scraping program, created in Python, that stopped working properly. I need someone to go through the program files, and fix the part that is no longer working properly. We need to end up with a working program.

The purpose of the program is to:

1. Spider through a website ad download all files that result from the spidering

2. Format each file downloaded to a specific format

The program appears to be working properly, and goes through the same steps as it always did, but there are no output files created. There is a good chance that there is change need in the 'search and replace' file; "[url removed, login to view]"

I have attached the program files, including source code.

The following is the program description:

1. Spider through a website ad download all files that result from the spidering

2. Format each file downloaded to a specific format

Part one:

You will be given a batch of starting URL's that look like this:

[url removed, login to view]

You will follow each of these URL's that will lead to another page with links that look like this:

[url removed, login to view]

You will follow each of these URL's that will lead again to another page with links that look like this:

[url removed, login to view]

You will now follow each of these links that leads to a page that links to specific documents. The links within the pages tend to look like this:

<table width="480"><tr>

<td><table width="120">

<tr><td>

<a class="tpl" href="/cgi/t/text/text-idx?c=ecfr&SID=f68f503ab8017206c54fb367aaaa7851&amp;rgn=div8&amp;view=text&amp;node=10:1.0.1.1.4.1.56.1&amp;idno=10">

&sect;5.100</a></td></tr>

</table></td>

<td><table width="354">

<tr><td>Purpose and effective date.</td></tr>

</table></td>

</tr></table>

each of these links leads to a page that needs to be saved with the following naming structure that looks like this:

[url removed, login to view]

other examples of naming structures:

6cfrAppendix A to Part [url removed, login to view]

Part two of this project:

After you have downloaded each file, you will need to put each file into a specific html page structure.

1. You will first strip all of the information before <!-- startDynamic --> and after the <!-- endDynamic -->

2. You will now need to create a header for each record that looks like the files that are part of the samples.

3. You will need to replace the string in the text when it comes across a graphic:

example string:

Please replace:

<img src="/graphics/

With this string:

<img src="[url removed, login to view]

AND replace this string:

<a href="/graphics/pdfs/

With this string:

<a href="[url removed, login to view]

4. You will need to create a footer at the bottom of each section, after the p class=” cita, that looks this this example:

<p class="cita">[54 FR 53314, Dec. 28, 1989]</p>

<br><p><center>Copyright 2013 Compliance Publishing Corporation (877) 500-6737</center>

</body>

</html>

5. You must be able to accommodate both regular regulations and the Appendix sections

6. Some of the titles have one less level. This program must be able selectable to how many levels deep the individual text is located.

The project must be completed in 7 days.

.

Beceriler: Python, Web Scraping

Daha fazlasını görün: width first search, web scraping python 3, web publishing information, src program, src format file, search structures, search string in c, search string examples, python look for file, program website in python, node graphic, idx 5, idx 2, idx 1, html 5 img, how to fix a website, how to create html web pages, fix the web, deep first search, c. t. corporation, create web ad, compliance 11, python end, python download file, download file python

İşveren Hakkında:
( 76 değerlendirme ) Edina, United States

Proje NO: #6552649

Seçilen:

dingji

Hi, I have examined your script. It appears that the script does not understand the format of some page titles. This could mean that the ecfr web site has recently changed their format. I am almost certain I can fix Daha fazlası

2 gün içinde 130$ USD
(0 Değerlendirme)
0.0

13 freelancer bu iş için ortalamada 173$ teklif veriyor

srinichal

I am an expert in scraping and crawling willing to discuss further about the project specifications .

in 3 gün içinde157$ USD
(56 Değerlendirme)
6.6
SigmaVisual

Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of my similar experience: Amazon/Ebay Bots: [url removed, login to view] Daha fazlası

in 5 gün içinde210$ USD
(45 Değerlendirme)
6.5
sayno2bugs

This is Nitin having HUGE experience in scraping HUGE data in least amount of time. I code in php, python and perl and scrapers written by me are being used to scrape more than 30 million pages per day without being Daha fazlası

in 5 gün içinde277$ USD
(16 Değerlendirme)
6.1
anuyadav1

A proposal has not yet been provided

in 3 gün içinde200$ USD
(32 Değerlendirme)
5.4
dabing1205

hello, I am an expert in web scrapying, and also interested in your project. Please contact me to discuss more details for your project, Thanks!

in 7 gün içinde177$ USD
(14 Değerlendirme)
4.5
amcorreia

Ainda não foi fornecida uma proposta

in 3 gün içinde155$ USD
(1 Değerlendirme)
2.4
ponnanikaaran

A proposal has not yet been provided

in 2 gün içinde166$ USD
(1 Değerlendirme)
2.3
RiosR

Hi, I'm pretty sure I can fix it. I would look especially in creatSec and creatAppendix functions. It seems that there are also some problems with performance in the case of big Titles (like Title07, with 16657 down Daha fazlası

in 3 gün içinde155$ USD
(1 Değerlendirme)
2.0
jduncanvw

I feel like i am well suited for this particular job mainly because i have been specializing in seb scrapping python apps. It makes sence how these cease to function, the script seeks the desired data by navigating Daha fazlası

in 3 gün içinde155$ USD
(2 Değerlendirme)
1.9
vatay

Hi, I have looking the source code -- the file [url removed, login to view] you sould not have benn packed into, but all the same -- the bugs and your request change will be ready. in 7 days or less. With best regards: Vatay Világi Daha fazlası

in 7 gün içinde222$ USD
(0 Değerlendirme)
0.0
MagnusYevon

Interesting. I can do this project. Have done some scripts like this one in the past. :) Thank you. I will be waiting your response.

in 3 gün içinde155$ USD
(0 Değerlendirme)
0.0
bupeherve

A proposal has not yet been provided

in 6 gün içinde88$ USD
(0 Değerlendirme)
0.0