I have a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string; and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information.
This program does the basic functionality of extracting the information but has a few problems:
It depends on an external non-Java component: Chrome WebDriver
It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove
It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed.
You will get the current program Java code and you will need to solve the problems above. To do so, you will need to:
B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes.
Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.
Bu iş için 13 freelancer ortalamada $165 teklif veriyor
hi there. As a senior developer with 6 years of experience in Web Scraping, I have rich expertise and deep knowledge in almost all types of scraping tools published. But I always prefer using my own scraping tool writt Daha Fazla
Hello man. How are you today? I am very interested in your project, so I have prepared a bid. I know you need significant help. I sincerely want to help you. Of course, I know that to help you, you need to have experie Daha Fazla
Greetings Sir, I am Muhammad Faisal and i am a professional Java developer having almost 4 years of experience and we provide you quality work within your time and budget so, lets get started ? Thanks
i have optimized a lot of distributed crawling projects and Selenium based crawlers ... please connect me over chat and share me your code. i will first identify all the pin points and alternatives for your authentica Daha Fazla