The goal of the project is to build a scalable e-commerce app with an inter grated blog section.
Most items will be created by web scraper which should scrape data from more a dozen different websites at first. Later on, it should be possible to upscale the scraper to a few thousand websites.
Those websites are known and should be added iteratively to the scraper. The websites have a different structure each which is why the development and maintenance costs per site need to stay as small as possible. The aim is to scrape the websites on a weekly basis at first. Later on, the scraping intervals should be reduced to a daily basis or even shorter. The scraped data needs to be stored in an useful and efficient way in a database in the cloud as it will be used to be products and personal shopper reccomendations.
Furthermore, the scraping must be intolerant to changes in the designs of the websites and it must prevent being blocked.
Currently, a simple scraper in Python exists which can scrape a few websites by using the Selenium library. However, this does not need to be continued at all cost.
The following tasks are part of your engagement for the project:
o Developing a modular and scalable software architecture for the web scraping project (preferably with Python)
o Containerizing the program in Docker
o Deploying and managing the containers in the cloud, probably with AWS and Kafka
o Implementing different measures to prevent blacklisting and being blocked
o Setting up a SQL database, probably PostgreSQL with AWS
The following tasks might be part of a further engagement:
o Implementing the web scrapers for a large number of different websites
o Maintaining and monitoring the scrapers for the websites
o Adding a web crawler to find additional websites
o Parsing the stored data and processing them into a more useful format
o Web Scraping (Importance: 9/10)
o Python (Importance: 7/10)
o Docker (Importance: 8/10)
o AWS (Importance: 5/10)
o Kafka or other Pipelining/Queuing Tools (Importance: 8/10)
o Cloud Databases (Importance: 6/10)
o PostgreSQL (Importance: 10/10)
Bu iş için 29 freelancer ortalamada $2266 teklif veriyor
I have reviewed project description and have some questions which i would like to discuss further. Please message me so we can discuss further. Thank you.