I am trying to develop a way to automate the daily retrieval of PDF's from a State Government website and extract (scrape) specific information from the document. The procedure is as follows:
1. Go to URL [url removed, login to view]
2. Enter Docket Code: ORDER
3. Enter Case Type: CD
4. Enter Date Range (to be done daily)
5. Hit ‘Search’
6. Open first document by clicking hyperlink under ‘ID’ Column
a. Identify RELIEF SOUGHT
b. If RELIEF SOUGHT = ‘POOLING’ continue to step 7
c. Else, return to results and open next document, then repeat a/b
7. If DISMISSED return to step 6
8. Else, identify fields highlighted in example documents
9. Export results to excel database – each column name marked in red on example documents
10. Return to search results and continue searching through documents with criteria from step 6
Obviously, I only need the PDF's that pertain to POOLING as the RELIEF TYPE.
I am looking to organize all this data in a program like Excel for my use. I'd like the data to be organized by Order Date, Cause CD No. and then a column for each piece of information highlighted and identified in red in the example documents.
I have provided two examples to show that the document may vary somewhat in formatting and the presentation of data.
Hello sir, I can deliver required scraper with excellent quality.
Bu iş için 13 freelancer ortalamada $367 teklif veriyor
Dear minz08, Greetings!Please refer to your PM For Bid details. Thanks Dhruvika
we do not offer package we offer Only guaranteed results..100% quality work within time limit and ...Read more in PMB
Sir, I can do the project. Refer PMB. Looking for further discussions in this matter. with thanks and regards
I will really love to work on this for you. I have gone through the whole thing, being to site and looked at the example attached and understood what you are looking for.
Hi!, i can do this very fast, i have similar project done here so i can use this.
Looks like very few of pdfs has all those required fields, correct?