Kapalı

Improve webpage scrapping solution

I have a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string; and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information.

This program does the basic functionality of extracting the information but has a few problems:

It depends on an external non-Java component: Chrome WebDriver

It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove

It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed.

Deliverables

You will get the current program Java code and you will need to solve the problems above. To do so, you will need to:

A. Find out how to authenticate and refresh the 3 headers and the query string without depending on Selenium, Chrome Webdriver, and BrowserMobProxy. As most of this data is likely generated in JavaScript, you will need knowledge about JavaScript and how to execute JavaScript from within Java or convert the JavaScript code to Java (preferable solution).

B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes.

Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.

Beceriler: Java, Web Scraping, JavaScript, Python, Yazılım Mimarisi

Daha fazlasını gör: webpage pdf solution, java solution attendance systems data, register domain data xml php, solution invalid post data magento connect manager, php register facebook data, facebook login register user data, free register online data entry jobs, register online data entry jobs, data converter from dwg andor dxf to shp gdb kml, data entry and typing at home to earn lot money how, data entry project its not a home based project, data entry services offered for a wide range of projects from mailing lists to manuscripts, help to register online data entry company without investment, how to register free lenceer as out sourcing income, we want a secure web application where we can add edit and remove delete the data as per our need anytime, we want a secure web application where we can add edit and remove/ delete the data as per our need anytime, how to register get data entry job, how to register in la as a freelancer, register for free as a freelance graphic designer

İşveren Hakkında:
( 1 değerlendirme ) Băilești, Romania

Proje NO: #26781086

Bu iş için 13 freelancer ortalamada $165 teklif veriyor

schoudhary1553

Hi, Greetings! ✅checked your project details: Improve webpage scrapping solution ✅Completed Time: In project deadline We have worked on 600 + Projects. I have 6 + years of the experience in same kind of projects Daha Fazla

$220 USD in 4 gün içinde
(128 Değerlendirme)
7.2
narsim3128

NOTE : I HAVE EXPERTISE IN JAVA AND JAVASCRIPT. I CAN COMPLETE THIS TASK IN QUICK TIME. With respect to this project I would like to present myself as a candidate for your consideration. I have more than 12 years of I Daha Fazla

$140 USD in 4 gün içinde
(37 Değerlendirme)
5.7
WebXcellance

Hello there, We will work with you, Having 5 year experience in website design and developments. Please provide more detail about the project, also see our recent work. [login to view URL] [login to view URL] Daha Fazla

$255 USD in 7 gün içinde
(28 Değerlendirme)
5.5
Demenntor

Dear Employer, I have read the project details and confident to work on improving web page scraping. I have extensive knowledge on Java, javascript,python, software architecture etc . Kindly message me so that we can Daha Fazla

$222 USD in 3 gün içinde
(30 Değerlendirme)
5.2
alexmahmudova

hi there. As a senior developer with 6 years of experience in Web Scraping, I have rich expertise and deep knowledge in almost all types of scraping tools published. But I always prefer using my own scraping tool writt Daha Fazla

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% USD
(10 Değerlendirme)
4.3
EkoLike

Hello man. How are you today? I am very interested in your project, so I have prepared a bid. I know you need significant help. I sincerely want to help you. Of course, I know that to help you, you need to have experie Daha Fazla

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% USD
(5 Değerlendirme)
4.0
mfaisal902

Greetings Sir, I am Muhammad Faisal and i am a professional Java developer having almost 4 years of experience and we provide you quality work within your time and budget so, lets get started ? Thanks

$140 USD in 7 gün içinde
(19 Değerlendirme)
3.6
TheRevoltingX

Hi, I am an expert developer who has done extensive web scrapping work only with Java. I can remove all dependencies on Chrome and remove any sort of sleeps. The way i'd do it is to only wait until the DOM is rendered Daha Fazla

$250 USD in 7 gün içinde
(2 Değerlendirme)
3.3
jonnathanceballo

Hello! I am happy to put my bid on your project. I have read your requirement and I noticed that I am appropriate to this project. As a skillful software developer, I have rich experience with C# .net, python web scrap Daha Fazla

$150 USD in 7 gün içinde
(2 Değerlendirme)
2.5
nikolay420

Hello,sir.. How are you?... I am very interested in your project. I have rich experiences in scraping website and processing data within selenium package. Have experiences in scrapers used chrome extension and proxy se Daha Fazla

$140 USD in 7 gün içinde
(4 Değerlendirme)
2.5
$156 USD in 3 gün içinde
(0 Değerlendirme)
0.0
kavetiraviteja

i have optimized a lot of distributed crawling projects and Selenium based crawlers ... please connect me over chat and share me your code. i will first identify all the pin points and alternatives for your authentica Daha Fazla

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% USD
(0 Değerlendirme)
0.0
$156 USD in 3 gün içinde
(0 Değerlendirme)
0.0