Kapalı

Java expert - Improve webpage scrapping solution

Request details

I developed a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string (without these, the server after some requests starts responding 512); and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information.

This program does the basic functionality of extracting the information but has a few problems:

It depends on an external non-Java component: Chrome WebDriver

It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove

It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed.

Deliverables

You will get the current program Java code and you will need to solve the problems above. To do so, you will need to:

A. Find out how to authenticate and refresh the 3 headers and the query string without depending on Selenium, Chrome Webdriver, and BrowserMobProxy. As most of this data is likely generated in JavaScript, you will need knowledge about JavaScript and how to execute JavaScript from within Java or convert the JavaScript code to Java (preferable solution).

B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes.

Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.

Beceriler: Java, JavaScript, Yazılım Mimarisi, PHP, Web Scraping

Daha fazlasını gör: java project linked list assignemnt solution, java media player webpage, nice java code gallery webpage, java samples improve website look, salary java expert, integrate java expert system shell jsp, java expert mumbai, java expert melbourne, java photo galleries webpage, absolue java expert, freelance java expert, java screen scrape webpage, java car buses boat program solution, time needed become java expert, swing java expert, java expert needed, request java expert, java expert needed code fixing

İşveren Hakkında:
( 1 değerlendirme ) Băilești, Romania

Proje NO: #26819050

Bu iş için 8 freelancer ortalamada $513 teklif veriyor

schoudhary1553

Hello, I can help you to Improve webpage scrapping solution I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already completed several projec Daha Fazla

$500 USD in 7 gün içinde
(132 Değerlendirme)
7.2
milandjokic

Hello. How are you? I have already read your description and i think i am qualified for this subject. I'm full-stack web developer. "He that has most time has none to lose" it's my creed. I'll solve your anyproblems in Daha Fazla

$400 USD in 7 gün içinde
(8 Değerlendirme)
4.2
serhiilyskin

Hi, sir. I have carefully checked your requirements and I was glad that I've already done this kind of projects before. I'd love to share more detail with you over chat and I'm sure that you'll be interested in them. I Daha Fazla

$555 USD in 6 gün içinde
(1 Yorum)
2.0
jitenp0091

Thanks for project posting   and I respect it  I recently worked on the project like yours and can provide you demo work as well  Do you want free demo ? ping me in freelancer message board  Thanks and Regards,   Daha Fazla

$500 USD in 7 gün içinde
(0 Değerlendirme)
0.0
joepower641

I live in wordpress, I can easily help you in woocommerce on anything you are trying to do, and I can even do it on screenshare if you like! Please check out my work here: [login to view URL] Rate $40hr an Daha Fazla

$750 USD in 7 gün içinde
(0 Değerlendirme)
0.0
vparishuddam

Hi, I have 11 years of experience in JAVA,J2EE Technologies and Selenium.I am Expert in Spring boot and micro services.i worked on REST service,SOAP service, JSP, HTML,CSS,BOOTSRAP,JavaScript angular js, AWS, Docker. I Daha Fazla

$550 USD in 7 gün içinde
(0 Değerlendirme)
0.0
AkostarMS

⭐No Problem. I can start immediatly⭐ ✔✔✔Fastest, I am sure I won't let you down.✔✔✔ ✔Before starting, I wish that you and your family are doing well in these unprecedented times. ✔I have read through your entire descri Daha Fazla

$500 USD in 5 gün içinde
(1 Yorum)
0.0
feicsh

I have experience in scrapping various website pages including ajax pages without using selenium. Fine tune the frequency of refresh can minimize the error, but no guarantee of free from robot detection of your target Daha Fazla

$350 USD in 5 gün içinde
(0 Değerlendirme)
0.0