I need to save a snapshot (all html files) of the website [login to view URL] It is an online forum that allows people to post and follow each other. I want to save the following information:
1. [login to view URL] saved as '[login to view URL]'
2. On the index page, there are 23 forums (Notice the Porn Addiction and Porn-Induced Sexual Dysfunctions are two forums when I count). I need all pages of all threads in each of the 23 forums to be saved. For example, the first forum is shown as "Rebooting - Porn Addiction Recovery". After clicking on it, it leads to [login to view URL] The ending number 2 in the previous link is an identifier. I want this page to be saved to "[login to view URL]". There are 583 pages of threads (posts) in this forum. You can save them to "[login to view URL]" all the way to "[login to view URL]". In each of these pages, there are 50 threads (a little more on the first page due to some information and announcement at the top). Each of the 50+ thread may contain multiple pages as well. I need all these pages of html files saved too. For example, the first post is "[login to view URL]". The ending number 88344 is also an identifier, I want them to be saved to "[login to view URL]" to "[login to view URL]" (5 pages of this posting thread).
3. I want all the user profile pages to be saved as well. The website ([login to view URL]) shows there are 156,726 members. You can actually enumerate all of them starting from 1 to 156726 using the following link(for user 1): [login to view URL] In this user profile page, I need html pages that show the 5 tabs "Profile Posts"(It may have multiple pages, all pages needed), "Recent Activity" ("Click on Show older items" at the bottom until the button disappears so that everything is captured), "Postings" (No need to find all since all postings are captured in the previous step), "Information", "Groups". Moreover, I want to know the user_id of the "Following" and "Followers". For example, user 1 is following 8 other users and followed by 826 users. I want 2 tables (csv or sqlite) to save the Following/Followers information, each with 2 columns. Following Table: user_id, following_user_id; Followers Table: user_id, follower_user_id. In the Following/Followers information, only 20 users are shown each page, you need to click on the more button multiple times to enumerate all users.
1. The program should be able to finish running within 24 hours (Multithreading might be needed. For example, several threads can handle several forums, one thread can handle the user profile pages). The shorter the time, the better. Because I plan to scrape the websites on different days to see the change of users and posts.
2. Since I want to scrape this website in different days, it would be great to do some type of incremental scrapping. Running it the first time would save everything, but running it again would keep a "diff" type of files necessary to know what is deleted (user, user following relationship, threads). That would save a lot of hard disk space because I don't need to save duplicate html files that are already saved.
3. Python 3.5+ and other packages that you find necessary
4. The program should login to the forum before saving the html files. It is free to register. Login credentials can be provided upon requested.
5. The program will run on Linux Ubuntu
6. Clear comments in the code so that I can modify later
7. Object oriented design is preferred
Bu iş için 18 freelancer ortalamada $283 teklif veriyor
Dear,Sir How are you? I am very interested in your project and am ready for starting your project for now. I have experienced in developing Python, Web Scraping. I will work very hard and best for you. Best Regard Daha Fazla
Hi there, I just checked the project details and i'm very interested to discuss with you. I have great knowledge in web scraping and i use python. Feel free to pm so that we can discuss and share sample work! Regards. Daha Fazla
Hello, I have the good knowledge of Python web scrapping/crawling in nofap.com. I have more than 5 years of experience in Python, Web Scraping . We have worked on several similar projects before! We have worked on Daha Fazla
Hello, My name is MingZhu.Z from China. You Can Check Website made by me. [login to view URL] I have completed soon facebook post scrap project. I have already seen and understood what want you. We are a team de Daha Fazla
Hello client. Hope you are doing well Over 9 +years experience writing almost exclusively web scraping code. I've done it all. I can scrape all LinkedIn profile My languages in order of experience and use is Python,dat Daha Fazla
Can do it with selenium/scrapy or beautifulsoup of python whatever you want.
Hello i suggest to implement the crawler in java to support any OS linux and windows the crawler will be multithread and gives as output a xls file or a db file as you want i invite you to discuss more over chat Daha Fazla
Hey - I've checked [login to view URL] and confirm you that we can build a Python crawler as per your requirements. Please drop me a message so we can discuss every detail, thanks ~ Steve
Hello, I have briefly read the description on Python web scrapping/crawling in [login to view URL] development, and I can deliver as per the requirements however I need us to discuss for more clarity on the details, deadline Daha Fazla
Hi I reviewed your description carefully. Thus, I am very interested in your project. I bided 800USD for your project because I had the full understanding of your project. I have an experience in building an applica Daha Fazla
I am pretty much familiar with the task and do similar things frequently.
Hello Thanks for your posting I have read your post with great care and interesting I have rich experiences of web scrapping I can take snapshot of all html files in [login to view URL] I can do it within 3 days and update Daha Fazla
I checked all your requirements properly. Would be able to scrap the info. Much skilled in python. Rate: $18/hr Let us discuss and start. Sandeep
I have 6 year experience Freelancer,up work,Fiverr & 99design market place I have seen your project that i can to do easily because I have many experience to Graphic Design,Webdesign,Web Develop & programming .So I cou Daha Fazla