
Open
Posted
•
Ends in 6 days
Paid on delivery
Budget: $1,800 USD (Fixed Price) Timeline: 7-10 days Tech Stack: Python 3.10+, Playwright (Async), SQLite, Ubuntu VPS Skills Required: Python, Playwright, Web Scraping, Async/Await, Proxies, SQLite ═══════════════════════════════════════════════════════ PROJECT DESCRIPTION ═══════════════════════════════════════════════════════ I need 4 production-ready, high-performance web scrapers for auto parts websites and auction history. The goal is to build a robust data pipeline that runs autonomously on an Ubuntu VPS using Bright Data residential proxies. WEBSITES TO SCRAPE: 1. [login to view URL](Wholesale parts - Login required) 2. [login to view URL] (Wholesale parts - Login required) 3. [login to view URL] (Auction history - Public) 4. [login to view URL] (Retail catalog - Public - Complex navigation) ═══════════════════════════════════════════════════════ TECHNICAL REQUIREMENTS (Non-Negotiable) ═══════════════════════════════════════════════════════ 1. ASYNC/AWAIT ARCHITECTURE - Must use Python asyncio + Playwright - NO Selenium allowed - Clean, maintainable async code 2. CONCURRENCY - Handle 10-30 concurrent browser contexts efficiently - Proper resource management (no memory leaks) - Configurable concurrency limits 3. BANDWIDTH OPTIMIZATION (CRITICAL) - Block images, fonts, CSS, videos, media files using [login to view URL]() - Target: Under 300KB per page load (vs 2-5MB unoptimized) - This directly reduces Bright Data proxy costs by ~80% - Log bandwidth usage per 100 pages for verification 4. DATA INTEGRITY (CRITICAL) - NO direct CSV writing during scraping - Must save to local SQLite database first (prevents data loss on crash) - Database structure: * Each scraper has own database: [login to view URL], [login to view URL], [login to view URL], [login to view URL] * Tables include: id (autoincrement), scraped_at (timestamp), all data fields * Checkpoint table: tracks progress (last_vehicle_id, last_page, etc.) - Separate export script: python [login to view URL] to convert SQLite to CSV on demand 5. RESILIENCE - Checkpoint System: If script stops at record 5,000 of 10,000, must resume exactly there - Retry Logic: Auto-retry on 500/403 errors or timeouts - Graceful shutdown: SIGTERM should save checkpoint before exit 6. INFRASTRUCTURE - Must run headless on Ubuntu 24.04 VPS - Bright Data proxy integration (credentials provided) - Configurable via .env file - Production-ready error logging ═══════════════════════════════════════════════════════ SPECIFIC CHALLENGES & REQUIRED SOLUTIONS ═══════════════════════════════════════════════════════ 1. [login to view URL] (The Beast) Challenge: - Complex tree navigation system - "Soft Blocks" - infinite loading bars, empty results after N requests - Aggressive bot detection Required Solution: - Implement browser fingerprinting stealth techniques (playwright-stealth or similar) - Handle dynamic category tree expansion efficiently - Detect and handle soft blocks (wait, retry with new session) - Must NOT open thousands of tabs (memory explosion) 2. PARTSMAX / PRIMEROAUTOPARTS Challenge: - Pricing requires hover interactions or variant selection - Distinguish "List Price" vs "Your Price" (member pricing) Required Solution: - Accurate hover-based data extraction - Handle missing prices gracefully - Extract stock availability 3. BIDFAX Challenge: - High volume pagination (~500,000 records) - Potential rate limiting Required Solution: - Efficient pagination without timeouts - Checkpoint every N pages - Handle network interruptions gracefully ═══════════════════════════════════════════════════════ DELIVERABLES (Per Scraper) ═══════════════════════════════════════════════════════ For EACH of the 4 scrapers: 1. [login to view URL] - Main asynchronous scraping script 2. [login to view URL] - SQLite schema definitions 3. [login to view URL] - Centralized configuration (concurrency, timeouts, retries) 4. [login to view URL] - SQLite to CSV converter with filters (date range, vehicle type, etc.) 5. [login to view URL] - Quick script to check current scraping progress 6. [login to view URL] - All Python dependencies 7. [login to view URL] - Step-by-step setup guide specific to this scraper PLUS Global Deliverables: 8. [login to view URL] - Bash script to install Python/Playwright/dependencies on fresh Ubuntu VPS 9. [login to view URL] - Template for proxy credentials and global settings 10. Video Walkthrough - 15-20 minute Loom/screen recording explaining: - Code architecture - How to deploy on VPS - How to run each scraper - How to monitor progress - How to handle common errors 11. GitHub Repository: I will create a private GitHub repository and add you as a collaborator. You must push all code directly to the main or dev branch of my repository. This is a requirement for milestone payments ═══════════════════════════════════════════════════════ DATA EXTRACTION REQUIREMENTS ═══════════════════════════════════════════════════════ PARTSMAX Output (SQLite then CSV): year, make, model, description, part_number, your_price, stock, list_price, scraped_at PRIMEROAUTOPARTS Output: Similar structure to PartsMax but with image_url. BIDFAX Output: final_bid, auction, lot_number, sale_date, sale_location, vin, make, model, year, Documents_title, Seller, Primary_Damage, Secondary_Damage, odometer, condition, Estimated_Retail_Value, Transmission, Keys, Fuel, drive, scraped_at ROCKAUTO Output: year, make, model, engine, category, part_type, manufacturer, part_number, price, scraped_at ═══════════════════════════════════════════════════════ PAYMENT MILESTONES ═══════════════════════════════════════════════════════ Milestone 1 (40% - $720) - Day 3-4: - PartsMax + primeroautoparts completed and tested - Both tested with 1,000+ records each - SQLite implementation verified - Bandwidth optimization confirmed ([login to view URL] working) - Code in GitHub repository Milestone 2 (40% - $720) - Day 7-8: - BidFax + RockAuto completed and tested - RockAuto soft-block handling verified with 500+ requests - All 4 scrapers deployed and running on VPS - Checkpoint/resume tested (kill process, restart, verify continuation) Milestone 3 (20% - $360) - Day 10: - All documentation complete - Video walkthrough delivered - Final stress test passed (run all 4 scrapers for 2+ hours) - 7-day support period begins ═══════════════════════════════════════════════════════ WHAT I PROVIDE ═══════════════════════════════════════════════════════ - Dedicated VPS: I will provide a clean Ubuntu 24.04 LTS VPS (DigitalOcean) with 4GB RAM / 2 CPUs. I will provide access via your SSH Public Key. - Bright Data residential proxy credentials (high-quality proxies) - Login credentials for PartsMax and primeroautoparts - Sample vehicle combinations list (CSV with Year/Make/Model) - Quick response time for questions (I'm technical, no hand-holding needed) - Existing reference code (Selenium-based, can provide as context) ═══════════════════════════════════════════════════════ SELECTION CRITERIA ═══════════════════════════════════════════════════════ You MUST have: - Portfolio with 5+ web scrapers (Playwright strongly preferred) - Experience with async Python and concurrent programming - Proxy integration experience (Bright Data, Oxylabs, or similar) - 90%+ job success rate on Freelancer - Ability to start within 24 hours I will immediately reject bids that: - Don't answer ALL screening questions below - Propose using Selenium instead of Playwright - Have no portfolio or generic "I can do this" responses - Bid under $1,200 (indicates you don't understand complexity) - Bid over $2,500 (overpriced for this scope) ═══════════════════════════════════════════════════════ SCREENING QUESTIONS (Must Answer All) ═══════════════════════════════════════════════════════ 1. What specific Playwright method/approach will you use to block images and media to save proxy bandwidth? (Be specific - code snippet preferred) 2. How do you handle RockAuto's "soft blocks" or infinite loading states? What's your detection and recovery strategy? 3. Have you scraped PartsMax, primeroautoParts, or similar wholesale auto parts portals before? If yes, which ones? 4. Describe your exact approach for SQLite checkpoint/resume. What happens if the script crashes at record 5,432 of 10,000? 5. How many concurrent Playwright contexts can you safely run on a 4GB RAM VPS without memory issues? 6. Share a link to your best async web scraper (GitHub or portfolio). What was the scale? (Records scraped, pages/hour, etc.) 7. Are you available to start immediately and deliver in 7-10 days? ═══════════════════════════════════════════════════════ REQUIREMENTS & PROFILE ═══════════════════════════════════════════════════════ I am a technical founder (Licensed Dealer). This is Phase 1 of a larger infrastructure (8+ future scrapers planned). MANDATORY: Async Playwright Only (No Selenium). Must deploy on Ubuntu VPS. Must deliver in 7-10 days. ═══════════════════════════════════════════════════════ TO APPLY (Must Include) ═══════════════════════════════════════════════════════ Answer all 7 screening questions. Link to GitHub repos showing large-scale Async Playwright scrapers. Confirm 24h start and $1,800/10-day terms. Suggest one technical improvement to my approach.
Project ID: 40189708
30 proposals
Open for bidding
Remote project
Active 3 secs ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
30 freelancers are bidding on average $2,155 USD for this job

Hi there, I’m excited about the opportunity to create four high-performance web scrapers specifically tailored for the auto parts industry. As a top freelancer from California with 5-star reviews, I understand the complexities involved in scraping wholesale and retail websites, particularly with the need for precision and efficiency in challenging environments. With extensive experience in Python, Async/Await architecture, and Playwright, I can assure you that your requirements for concurrency, bandwidth optimization, and data integrity will be met. I have developed similar projects before, allowing for resilient scraping with robust error handling and checkpoint systems, especially addressing the specific challenges of scraping sites like RockAuto and PartsMax. I would love to discuss how I can tailor these scrapers to your needs and start as soon as possible. Let’s connect to finalize the details and get started! What specific metrics are you looking to analyze from the scraped data, and how do you envision using this data in your larger infrastructure? Thanks,
$2,750 USD in 1 day
6.7
6.7

I've built high-performance web scrapers using Playwright and Python before, and I'm ready to tackle your project with expertise! ? With over 6 years of experience, I specialize in async programming and can create efficient scrapers that meet all your requirements, especially for challenging sites like RockAuto. I understand the critical need for bandwidth optimization and data integrity, ensuring your scrapers run smoothly while managing resources efficiently. ?️ 1️⃣ What specific data points are you most interested in scraping? 2️⃣ Do you have any existing code or configurations to build upon, or should I start from scratch? 3️⃣ Are there any specific error handling strategies you prefer during the scraping process? Let's connect — just click the chat button, and let’s get your work done quickly. I’m ready and waiting for your message.
$1,800 USD in 7 days
6.0
6.0

Hello, I understand you're looking for four production-ready web scrapers for auto parts and auction history, aiming for high performance and robust data integrity. My experience with async web scraping using Python and Playwright makes me a strong candidate for this project. In previous projects, I successfully developed similar scrapers that efficiently handled complex sites, ensuring data accuracy and optimal bandwidth usage. ✅My Plan: - Utilize Python asyncio and Playwright for a clean, maintainable codebase. - Implement concurrency to manage multiple browser contexts without memory leaks. - Optimize bandwidth by aborting unnecessary media, targeting under 300KB per page load. - Design a resilient checkpointing system to ensure data integrity, enabling seamless restarts. Could you clarify the preferred method for handling any IP bans or rate limits during scraping? Additionally, how flexible are you with the timeline if challenges arise? Best regards, Hongqiang Chen
$1,800 USD in 12 days
5.4
5.4

Hi there, I’m Ahmed from Eastvale, California — a Senior Full-Stack Engineer with over 15 years of experience building high-quality web and mobile applications. After reviewing your job posting, I’m confident that my background and skill set make me an excellent fit for your project — 4 Async Web Scrapers (Auto Parts) - Python/Playwright/BrightData . I’ve successfully completed similar projects in the past, so you can expect reliable communication, clean and scalable code, and results delivered on time. I’m ready to get started right away and would love the opportunity to bring your vision to life. Looking forward to working with you. Best regards, Ahmed Hassan
$2,500 USD in 1 day
5.1
5.1

Hello! I understand that you're looking for four high-performance web scrapers specifically designed for auto parts websites, to streamline data extraction and build a robust data pipeline on an Ubuntu VPS. I will employ Python with the Playwright async framework to build efficient, maintainable scrapers that can handle concurrent sessions while optimizing bandwidth usage to reduce proxy costs. The code will be thoroughly documented, and I'll ensure resilience through a checkpoint system. For examples of my previous work, please check my profile. Regards, Davide
$1,800 USD in 20 days
5.1
5.1

Hi, I am enthusiastic about your project for building 4 high-performance web scrapers tailored for auto parts websites. With over 7 years of experience in web scraping and a strong command of Python and Playwright, I have successfully implemented async solutions that handle complex data structures efficiently. I specialize in bandwidth optimization, concurrency management, and data integrity to ensure each scraper operates seamlessly on your Ubuntu VPS with Bright Data proxy integration. I can deliver all components within your required timeline of 7-10 days. What specific concerns do you have regarding the implementation of the scrape for RockAuto? Best regards, Andrii
$2,500 USD in 10 days
4.5
4.5

⭐ If you award me, your smile shows up ⭐ Hi , Your project immediately stood out to me—it closely matches work I’ve completed successfully in the recent past. The core challenges, structure, and technical requirements are very familiar, with only a few unique elements that align perfectly with my expertise. This is great news for you: it allows me to skip the usual ramp-up time, avoid trial-and-error, and deliver clean, high-quality results quickly and confidently. I bring hands-on experience with Node.js, Backend Development, MySQL, Web Scraping, Software Architecture, API Development, PHP, Database Management, Python and Web Development, along with proven workflows and best practices refined through multiple similar projects. You can view a directly relevant example in my portfolio here: https://www.freelancer.com/u/thomasb726 I’d be happy to discuss your specific goals in more detail and share tailored ideas based on what has worked best in comparable scenarios. Why clients choose—and continue working with—me: • Clear, proactive communication so you always know where the project stands • Strong respect for your deadlines, budget, and business reputation • Responsive, approachable, and focused on a smooth, stress-free process • Reliable post-delivery support that often leads to long-term partnerships If you’re looking for precise execution, high-quality results, and a dependable long-term partner, I’d love to connect and help bring your project to life. Best regards
$3,000 USD in 3 days
4.0
4.0

Hello Carlos, I'm Bwalya, an experienced developer with 8 years of expertise in PHP, Python, Web Scraping, and Web Development. I have worked extensively with Python asyncio, Playwright, and SQLite databases. I have thoroughly reviewed your project requirements for the development of 4 high-performance web scrapers for auto parts websites and auction history. I am confident in my ability to deliver a robust solution that meets all your technical specifications, including implementing async/await architecture, handling concurrency efficiently, optimizing bandwidth usage, ensuring data integrity, and providing resilience in case of interruptions. I would love to discuss the project further with you. Please feel free to connect with me in the chat to explore how we can proceed with this exciting opportunity. Best regards, Bwalya
$1,500 USD in 7 days
3.4
3.4

Hi there, I understand that your main goal is to develop four efficient and reliable async web scrapers for the auto parts industry using Python, Playwright, and BrightData. I have successfully built and deployed multiple web scraping solutions that increased data extraction efficiency by over 40% while ensuring compliance with website policies. My experience with Playwright has also allowed me to create robust scrapers that handle dynamic content seamlessly. To implement the requested features, I will leverage Playwright’s capabilities to create async scrapers that can handle multiple requests concurrently, ensuring faster data retrieval. Additionally, I will ensure that the scrapers are built with error handling and logging features to maintain reliability and ease of debugging. I would be happy to discuss your needs and get started right away. Best regards, Artem
$2,250 USD in 7 days
2.0
2.0

Hey Mate , Good afternoon! I’ve carefully checked your requirements and really interested in this job. I’m full stack node.js developer working at large-scale apps as a lead developer with U.S. and European teams. I’m offering best quality and highest performance at lowest price. I can complete your project on time and your will experience great satisfaction with me. I’m well versed in React/Redux, Angular JS, Node JS, Ruby on Rails, html/css as well as javascript and jquery. I have rich experienced in MySQL, Software Architecture, Database Management, Web Development, PHP, Web Scraping, Backend Development, Python, Node.js and API Development. For more information about me, please refer to my portfolios. I am checking your attachment, I'll update you shortly... I’m ready to discuss your project and start immediately. "I can do this" Looking forward to hearing you back and discussing all details.. Feel free to contact us to discuss your project
$2,500 USD in 5 days
0.0
0.0

Hello Mate!Greetings , Good afternoon! I’ve carefully checked your requirements and really interested in this job. I’m full stack node.js developer working at large-scale apps as a lead developer with U.S. and European teams. I’m offering best quality and highest performance at lowest price. I can complete your project on time and your will experience great satisfaction with me. I’m well versed in React/Redux, Angular JS, Node JS, Ruby on Rails, html/css as well as javascript and jquery. I have rich experienced in MySQL, Backend Development, Web Scraping, API Development, Python, Software Architecture, Database Management, PHP, Web Development and Node.js. For more information about me, please refer to my portfolios. I am checking your attachment, I'll update you shortly... I’m ready to discuss your project and start immediately. "I can do this" Looking forward to hearing you back and discussing all details.. If you have any questions, please let us know
$2,500 USD in 5 days
0.0
0.0

Hi, I’ve worked on projects like this before, so what you’re describing makes sense to me. And I'm really interested in this project - 4 Async Web Scrapers (Auto Parts) - Python/Playwright/BrightData. I usually focus on getting things done cleanly and making sure they work properly in real use, not just on paper. I’m comfortable either improving an existing setup or helping build something new, depending on what stage you’re at. I keep communication straightforward, share progress along the way, and flag issues early so there are no surprises later. If you want, you can share a bit more about the current setup or the goal you’re trying to reach, and I can let you know how I’d approach it. Thanks, Jesse
$2,500 USD in 15 days
0.0
0.0

Hi there, I noticed you require robust async Playwright scrapers using Python with optimized bandwidth control for Bright Data proxies. I've built similar automated scrapers handling concurrency and proxy integrations efficiently. I have 7+ years of experience in async Python web scraping and data pipelines. Recently, I delivered a multi-site scraper setup with resilient checkpoints and resource management, cutting bandwidth costs by 75%. For your 4 scrapers, I'll implement async/await with Playwright to manage 10-30 concurrent browser contexts using semaphore limits, fully blocking non-essential assets via route.abort. SQLite databases will save intermediate results, ensuring crash-safe checkpoints with graceful SIGTERM handling. The RockAuto scraper will include stealth techniques to evade soft blocks, plus retry/session rotation logic. PartsMax and PrimeroAutoParts scrapers will capture hover-dependent pricing and stock details with precise navigation. BidFax will use efficient paginator controls to avoid rate limits. Quick question: For the auction history scraper, what is the expected daily page volume to tune concurrency without triggering rate limits?
$2,500 USD in 1 day
0.0
0.0

DORAL, United States
Payment method verified
Member since May 30, 2025
$5000-10000 USD
$8-15 USD / hour
$30-250 USD
₹600-1500 INR
$30-250 USD
₹37500-75000 INR
$3000-5000 USD
$30-250 USD
₹1500-12500 INR
$750-1500 USD
$250-750 USD
$1500-3000 USD
$8-15 USD / hour
₹1500-12500 INR
€1500-3000 EUR
$30-250 USD
$30-250 USD
$30-250 USD
$750-1500 USD
$10-30 USD
₹12500-37500 INR
₹800000-3000000 INR