
Closed
Posted
Paid on delivery
# Multi-Source Real Estate Data Pipeline (Venezuela) — Apify → AWS PostgreSQL ## Overview We are a real estate brokerage with three offices in Venezuela (RM I — Táchira; RM II — Greater Caracas; RM III — Anzoátegui). We need a scheduled pipeline that scrapes **8 Venezuelan property portals** for these areas, normalizes everything into one schema, and stores it in **our AWS PostgreSQL** database. It runs **once per week**. Purpose: (1) property valuations & market intelligence — pricing/comparables and market movement over time (urgent priority); (2) an internal search tool showing which agent published each listing. We want all listing details + the agency + the publishing agent. **No photos** — just the URL and structured data. ## Platform & approach (please read) The whole system runs on **Apify** — one account (ours), one scheduler, one dataset, one integration to our DB. Build **one custom Apify Actor** (TypeScript + Crawlee preferred; Python + Scrapy OK) with a **router that sends each domain to its own adapter**. The 8 sites are built differently, so there is one adapter per site, all feeding one normalized schema. Apify unifies the platform (proxies, headless browser, scheduling); it does not remove per-site logic. Flow: Apify Actor (weekly) → normalize → Apify dataset → webhook → AWS Lambda → PostgreSQL (insert/update + change detection + price history). The only piece outside Apify is a small Lambda our developer deploys. **Method hierarchy per site (cheapest/most stable first):** 1. **Internal API first** — find the site's XHR/JSON endpoint (expected for TuHome24 SPA, Century21 AJAX, and the Wasi-based MLS Caracas). 2. **Plain HTML next** — simple HTTP + parser for server-rendered sites (Rivinotinto, RE/MAX). No browser, no expensive proxy. 3. **Headless browser last** — only where there's hard anti-bot (MercadoLibre, PERAIG): Playwright + Apify residential proxy. Do NOT use a headless browser on every site. Two-stage pattern: list pages → URLs + card data; detail pages → agent, agency, full attributes. ## The 8 portals (recon already done) - **MercadoLibre VE** — server-rendered + DataDome anti-bot. - Hard but doable. **START HERE.** Largest source (~63.7k in Greater Caracas). Public API blocked (403). Needs residential proxy. - **Rivinotinto** — classic PHP, data already in HTML, simple pagination (`?pagina=N`), no anti-bot. - Trivial. - **MLS Caracas** — built on [login to view URL] (CRM with its own API — likely a backdoor). Large inventory. - Easy-Medium. - **RE/MAX VE** — server-rendered, data in HTML (price, unique RE/MAX code, location, beds/baths/parking, m²). Agent exposed. **Search is map-based** (`ubi=` + lat/long). No hard anti-bot. --Easy-Medium. - **Rent-a-House** — large national network. [login to view URL] blocked our tool; recon it first in its phase. Medium. - **Century21 VE** — WordPress + Elementor, AJAX-loaded (HTML is a shell). Agent exposed. Medium. - **TuHome24** — single-page app (pure JS); HTML empty. Intercept its API or use a browser. Medium-Hard. - **PERAIG** — active bot detection; blocks direct requests. Medium. ## Data to capture **Per property (valuation fields are priority):** listing URL; source portal; office (RM I/II/III) + geographic state; operation (sale/rent); property type; price + currency (USD/VES); price per m²; built & land area (m²); bedrooms; bathrooms; parking; city/municipality/zone; full address; lat/long if available; title; description; external code (e.g. RE/MAX code) for dedup; status (active/paused/closed); date published/updated; first/last seen; and `raw_json` (full payload, JSONB — lose nothing). **Per agency:** name, contact (if available). **Per agent (best-effort, secondary):** name, phone, email, nickname. Availability varies — franchises and CRM sites usually expose it; MercadoLibre hides the phone. Not a blocker for valuations. ## Normalization (important) Tag every property with **office (`oficina_rm`: RM I/II/III)** AND **geographic state**. These differ for RM II: Greater Caracas spans two states — Libertador = Distrito Capital; Baruta/Chacao/El Hatillo/Sucre = Miranda. Derive state from the municipality. Each weekly run must detect new/changed/removed listings; price changes go to a history table. **Duplicates:** the same property may appear on several portals — do NOT de-duplicate aggressively. Store each appearance tagged by source; capture the external code where present to match later. Key by (portal, listing URL). ## Suggested schema (PostgreSQL) `portales`, `propiedades` (with portal_id, external_id/code, url UNIQUE per portal, oficina_rm, estado, municipality, zone, lat/lng, operation, type, price, currency, price_m2, areas, beds, baths, parking, title, description, agency_id, agent_id, status, dates, first/last_seen, raw_json JSONB), `inmobiliarias`, `agentes`, `precio_historico`, `scrape_runs`. Final DDL/indexes/migrations are part of your deliverable. ## Coverage by office - **RM I — Táchira:** MercadoLibre, RE/MAX, Century21, Rent-a-House. - **RM II — Greater Caracas (DC + East Miranda, 5 municipalities):** all 8 portals. The MercadoLibre "Distrito Capital" search already covers all 5 municipalities — verified. - **RM III — Anzoátegui:** MercadoLibre, RE/MAX, Rivinotinto, Rent-a-House, Century21, TuHome24. (Full target URLs and state IDs will be shared with the selected freelancer.) ## Phases (fixed price — each phase is a milestone) 1. **Base pipeline + MercadoLibre (START HERE).** Actor skeleton, PostgreSQL schema, normalization (oficina_rm/estado), insert/update + price history, Apify→Lambda→PostgreSQL integration, and the MercadoLibre adapter with residential proxy. Valuation fields are priority #1; agent is best-effort. Proof of full extraction of the ~63.7k Greater Caracas set. **Go/no-go gate.** 2. **High-value comparables:** RE/MAX, MLS Caracas (Wasi), Rivinotinto, Rent-a-House. 3. **Remaining:** Century21, TuHome24, PERAIG. 4. **Monitoring + handover:** Slack failure alerts (e.g. adapter returns 0 or drops >X%), docs (setup, runbook, schema), handover session. ## Deliverables Git repo (one adapter per portal); the Apify Actor (deployable to our account); PostgreSQL schema + migrations; Apify→Lambda→PostgreSQL integration; weekly schedule; Slack alerts + CloudWatch logging; documentation; handover session. ## Skills Node.js/TypeScript + Crawlee (or Python + Scrapy) · Apify (Actors, datasets, proxy, scheduler) · Playwright · reverse-engineering internal XHR/JSON APIs · anti-bot + residential proxies · PostgreSQL · AWS (Lambda, RDS, EventBridge) · data normalization. ## Deployment & security You will NOT receive our production AWS credentials. Final AWS deployment is done by our in-house AWS developer: you hand over code + Infrastructure-as-Code (Terraform or AWS SAM) + instructions, he runs the deploy. The scraper lives in our Apify account, which you can be granted access to. ## Cost control (mandatory) Recurring cost is dominated by MercadoLibre residential-proxy traffic, so: block images/CSS/fonts (no photos needed); full initial crawl then **incremental** detail fetches (only new/changed listings re-fetched in full; existing ones price-checked from list pages); residential proxy ONLY for MercadoLibre/PERAIG, plain HTTP/datacenter for the rest. A poor design costs 5–10× more for the same data. ## Budget Fixed price, **billed per phase** (Phase 1 first, as a go/no-go gate) — please quote each phase separately. **Do not include Apify/proxy/AWS costs in your price** (those run on our accounts). Also tell us your rate for ongoing maintenance after delivery (sites change their HTML over time). ## Include in your bid (so we can spot real scrapers) In 3 short lines, tell us how you'd approach: (1) **MercadoLibre** anti-bot reliably at ~63k listings; (2) **the Wasi-based MLS Caracas** — would you use its underlying API, and how would you find it; (3) **RE/MAX** map-based search (lat/long) — pagination and full coverage. Also: your stack, your Apify experience, and one similar pipeline you've built. **Note:** follow the API → HTML → headless hierarchy; keep each adapter isolated so one broken site doesn't stop the others; no photos.
Project ID: 40480105
126 proposals
Remote project
Active 3 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
126 freelancers are bidding on average $1,139 USD for this job

Hi, I reviewed your project. From your requirements, the main challenges are: • Building a single Apify-based scraping system with 8 different adapters (one per portal) while keeping them isolated and maintainable. • Handling anti-bot systems (especially MercadoLibre and PERAIG) using a cost-aware approach (API → HTML → headless browser only when required). • Reverse-engineering hidden APIs (Wasi/MLS Caracas, SPA-based sites like TuHome24). • Normalizing inconsistent real estate data into a unified PostgreSQL schema while preserving raw JSON for traceability. • Implementing reliable change detection (new listings, updates, and price history tracking). • Ensuring weekly scheduled execution with minimal proxy cost and strong failure monitoring. I can approach this by designing a modular Apify Actor architecture with per-site adapters, shared normalization layer, and a controlled crawling strategy that prioritizes internal APIs first, then HTML parsing, and only uses Playwright for strict anti-bot cases. Data would flow into Apify dataset → AWS Lambda → PostgreSQL with proper deduplication logic, history tracking, and structured indexing. For Phase 1, I would focus on MercadoLibre (largest dataset) + full pipeline foundation to validate scalability, normalization, and cost efficiency before expanding to other portals. Thank you for reviewing my proposal, and I would be glad to discuss.
$1,125 USD in 7 days
8.2
8.2

⭐⭐⭐⭐⭐ Build a Multi-Source Real Estate Data Pipeline for Venezuela ❇️ Hi My Friend, I hope you're doing well. I've reviewed your project requirements and see you're looking for a data pipeline to scrape Venezuelan property portals. Look no further; Zohaib is here to help you! My team has handled 50+ similar projects in real estate data scraping. I will create a robust pipeline using Apify and AWS PostgreSQL to meet all your needs. I will build a custom Apify Actor to scrape 8 property portals, normalize data, and store it in your database. My approach ensures efficient data handling and meets your requirements for property valuations and market intelligence. ➡️ Why Me? I can easily build your real estate data pipeline as I have 5 years of experience in web scraping, data normalization, and database management. My expertise includes Node.js, TypeScript, and AWS services. I also have a strong grip on data handling and API integration. ➡️ Let's have a quick chat to discuss your project in detail and let me show you samples of my previous work. Looking forward to discussing this with you! ➡️ Skills & Experience: ✅ Web Scraping ✅ Data Normalization ✅ PostgreSQL ✅ Node.js ✅ TypeScript ✅ Apify ✅ AWS Lambda ✅ API Integration ✅ Data Analysis ✅ Error Handling ✅ CloudWatch Logging ✅ Playwright ✅ Residential Proxies Waiting for your response! Best Regards, Zohaib
$900 USD in 2 days
8.0
8.0

Hi — Elias here from Miami. I see you're looking to build a multi-source real estate data pipeline focused on the Venezuelan market. The goal is to collect, process, and store data efficiently while ensuring accuracy and accessibility. What usually matters most here is managing the complexities of scraping data from various sources and integrating it into a PostgreSQL database on AWS. A common issue in systems like this is ensuring the reliability of the pipeline, especially with inconsistent data. The tricky part is usually managing the state and flow of data as you scale. My approach would involve designing a modular architecture for easy updates and maintenance. By leveraging AWS services effectively, I would ensure the pipeline is stable and can handle future expansions seamlessly. I have developed similar data pipelines using tools like Scrapy and Apify, focusing on automation and data integrity. A few questions to better understand the scope: Q1 – What specific data sources are you targeting for scraping? Q2 – Are there particular data validation or cleaning processes you envision? Q3 – How do you plan to manage user permissions for data access? Happy to discuss the details and suggest the best technical approach. Looking forward to hearing from you.
$1,200 USD in 5 days
7.4
7.4

Hi I can build the Apify-based real estate data pipeline using TypeScript, Crawlee, Playwright, PostgreSQL, AWS Lambda, and structured adapter-based scraping for each Venezuelan portal. The main technical challenge is keeping the MercadoLibre crawl reliable and cost-controlled at large volume under DataDome protection, so I would use residential proxy only there, block media assets, extract list-page data first, and re-fetch details incrementally. For MLS Caracas, I would inspect browser network traffic, XHR calls, and Wasi API patterns to capture structured JSON instead of scraping rendered pages. For RE/MAX, I would handle its map-based latitude/longitude search with controlled area slicing, pagination checks, and detail-page extraction to avoid missed listings. I would design one Apify Actor with isolated adapters per portal, normalized output, Apify dataset delivery, and webhook-based loading into AWS PostgreSQL. The database layer would include schema migrations, unique portal URL keys, raw_json storage, price history, scrape runs, and change detection for new, updated, or removed listings. I also understand the importance of office/state normalization, especially RM II where municipalities must map correctly to Distrito Capital or Miranda. This approach keeps the pipeline maintainable, resilient when one portal changes, and optimized for valuation data without downloading photos. Thanks, Hercules
$1,500 USD in 7 days
6.9
6.9

Hi, I will build a scheduled pipeline using Apify to scrape 8 Venezuelan property portals for your real estate brokerage in Venezuela. The pipeline will normalize the data into one schema and store it in your AWS PostgreSQL database. The system will run once per week, providing property valuations, market intelligence, and an internal search tool. Let's discuss further. Regards, Sai Bhaskar
$1,000 USD in 20 days
6.7
6.7

Hi, I can help you You want one weekly system that gathers homes from 8 sites in Venezuela, tidies them into one shape, tags each by office and state, and saves them to your AWS database with price history and change tracking. Start with MercadoLibre, then add the other portals. Keep costs low by only using heavy tools where needed and only re-check changes. This will take a few days, I've been doing this type of work for years. I have short walkthrough videos on my Freelancer profile showing similar work. 1) What data and DB tables already exist, and do you have the Lambda in place? 2) What should the final dashboard or queries show for valuations and agent tracking? Ideally, we have a call and go through the details together so I can make sure I understand everything correctly, address any questions, and give you a quote and timeline. Would that work? Best, Nicolas
$1,125 USD in 7 days
5.3
5.3

Hello, I understand you're building a market intelligence pipeline. It uses a central Apify Actor with routed adapters to scrape 8 portals, normalize the data with business logic (oficina_rm, estado), and sync it to your PostgreSQL DB via a Lambda. This Lambda will handle upserts and price history logging. Your tiered scraping strategy (API > HTML > Headless) is perfect for managing costs. My approach for your key targets: - MercadoLibre: Apify residential proxies with fingerprint rotation, managed by a Crawlee queue with exponential backoff to defeat DataDome. - Wasi MLS: We'll find the backing JSON API by inspecting network XHR requests and then call it directly. - RE/MAX Map: We'll iterate over a lat/long grid covering your regions to ensure complete data extraction, using the RE/MAX ID for deduplication. My stack of choice is TypeScript/Crawlee for the Actor. The data pipeline will be Apify Dataset → Webhook → Lambda → PostgreSQL. We'll build this modularly, starting with the core database schema, the Lambda function, and then the complex MercadoLibre adapter as per your Phase 1 plan. This de-risks the project immediately. Regards, Rohit
$750 USD in 21 days
5.4
5.4

Hello! I’ve built scraping and normalization pipelines where the hard part is not fetching pages, but designing the system so each source is isolated, cost-aware, and resilient over time. Your brief is clear, technically grounded, and exactly the kind of data engineering + scraping job I’m comfortable owning. I’d use TypeScript + Crawlee on Apify, with one adapter per portal behind a single Actor/router, then normalize everything into one schema and push it through the dataset → webhook → Lambda → PostgreSQL flow you described. I also fully agree with your API → HTML → headless hierarchy, the need to tag every record with oficina_rm + estado, and the rule to keep each adapter isolated so one broken site never stops the rest. 3 short lines: MercadoLibre: residential proxy only there, block images/CSS/fonts, session reuse, incremental detail recrawls after initial full crawl. MLS Caracas: yes, I’d first trace the underlying Wasi/XHR API from search/detail requests and map that payload into the shared schema. RE/MAX: I’d seed from the municipality map URLs, walk pagination/list states from those entry points, and hit detail pages only for extra fields. I can quote per phase and also give a maintenance rate for ongoing adapter fixes. Warm regards, Yulius Mayoru
$750 USD in 6 days
5.4
5.4

I understand you need a scheduled pipeline to scrape 8 Venezuelan property portals weekly for Táchira, Greater Caracas, and Anzoátegui, normalizing the data into a single schema and storing it in your AWS PostgreSQL database, prioritizing property valuations and market intelligence. I recently built a similar data pipeline for a European real estate firm, successfully integrating data from 10 diverse sources into a PostgreSQL database, which directly led to a 15% improvement in their comparative market analysis accuracy. My approach involves developing Apify actors for each of the 8 target portals, ensuring robust data extraction and handling of varying website structures. These actors will feed into a normalization script written in Python, which will transform the scraped data into your specified schema before loading it into your AWS PostgreSQL instance using the `psycopg2` library. The entire process will be orchestrated and scheduled to run weekly via Apify's built-in scheduler. How will the schema normalization handle missing fields or inconsistencies across the 8 different portal structures to ensure data integrity in PostgreSQL? Ready to start as soon as you confirm scope.
$1,300 USD in 21 days
5.2
5.2

With my comprehensive AI, Software and Embedded Systems Engineering background, I'm highly qualified to design and implement your Multi-source Real Estate Data Pipeline project. From developing intelligent and scalable solutions to high-performance software systems and distributed web platforms. My expertise in Python, a crucial language for data scraping and Apify's customizable Actor will enable me to effectively scrape data from the 8 Venezuelan property portals. I'm also experienced with leveraging APIs, such as MercaodLibre's, even in the presence of anti-bot measures. On the backend, my use of Node.js and PostgreSQL aligns perfectly with your AWS PostgreSQL database setup. Additionally, my understanding of the importance of normalized data and my experience architecting maintainable and scalable systems make me a great fit for this project. I will ensure all scraped data from the different portals undergoes uniform normalization into your desired schema before storing it into your AWS PostgreSQL databased. In conclusion, my ability to elegantly blend together various technologies- Apify, Python Scrapy, TypeScript + Crawlee - needed for a smooth multi-domain web scraping project like yours combined with my focus on clean architecture, performance optimization and hardware-software integration makes me an invaluable asset for this job. I look forward to discussing more about the details of the project with you.
$1,500 USD in 7 days
4.9
4.9

Hello, I am available now. I have read your project description carefully and I understand what you want. 300% Confidence!!! I have 7+ years of experience in Python, PostgreSQL, Node.js. I have completed similar projects. Please contact me. Best regards, Steven
$1,100 USD in 7 days
4.5
4.5

Hi there, Thank you for the thorough and well-structured project brief—it's clear you know exactly what you need. As Demivision LLC, we specialize in building robust, cost-effective data pipelines for market intelligence, especially in complex environments like Venezuelan real estate. We have extensive experience with Apify (Actors, scheduler, datasets, proxy management), Crawlee and Scrapy, and seamless AWS integrations, including Lambda and RDS/PostgreSQL. Our team has delivered similar multi-portal scraping systems for real estate and automotive verticals, focused on normalization, change detection, and cost control. For your project, we’d architect a modular Apify Actor (TypeScript + Crawlee preferred) with a router dispatching each of the 8 target portals to its own adapter, strictly following your API→HTML→headless hierarchy. The PostgreSQL schema will support valuation, change history, and deduplication by (portal, URL), with clear tagging for office and state as you described. We'll ensure incremental fetching and traffic minimization, using residential proxies only where essential. For your three technical points: 1. **MercadoLibre:** Use Apify Playwright with residential proxy, block non-essential resources, load list pages in batches, and detail-fetch only new/changed listings to handle scale and anti-bot. 2. **MLS Caracas (Wasi):** Reverse-engineer and intercept internal XHR/API calls (often /api/properties or similar) via browser devtools; prefer API for stability, falling back to HTML if needed. 3. **RE/MAX:** Automate the map-based search by generating lat/long grid points to paginate all areas, parsing each page for listings and following each to detail for agent/agency info. We're confident in delivering a reliable, maintainable pipeline that meets your valuation and market intelligence goals. Looking forward to discussing further and sharing relevant case studies.
$1,125 USD in 14 days
4.6
4.6

Hi, i've handled MercadoLibre's DataDome before and the trick is rotating headers with residential sessions so we don't burn the proxy budget on those 63k listings, for the Wasi-based portals I'll intercept the internal XHR calls to get clean JSON directly, for RE/MAX map-grid I'll script a recursive bounding box scan to ensure full coverage, I will set up the price history logic in Postgres using JSONB for the raw payloads like I did for a property group in Spain last month, send me the repo and let's go
$810 USD in 13 days
4.5
4.5

Hi, I’m Juan Pablo. I can build your full multi‑source real estate pipeline on Apify with a single Actor, eight isolated adapters and a clean normalization layer feeding your AWS PostgreSQL database. My approach is to follow your hierarchy strictly: internal APIs first, HTML parsing second, and headless only where anti‑bot makes it unavoidable. Everything flows through one dataset and into Lambda for insert, update and price‑history tracking, with oficina_rm and state derived automatically from municipality logic. For MercadoLibre, I would rely on Playwright with Apify residential proxies, block all non‑essential assets, fingerprint rotation and an incremental crawl that fetches full detail only for new or changed listings. For the Wasi‑based MLS Caracas, I would inspect network calls to identify the underlying JSON endpoints and replicate them directly to avoid HTML entirely. For RE/MAX, I would generate a grid of lat/long tiles to fully cover the map search and paginate through each tile until exhaustion. I’ve built pipelines of similar scale on Apify using TypeScript, Crawlee, Playwright, PostgreSQL and AWS Lambda, including anti‑bot handling and multi‑adapter architectures. If you want insight into Apify pipelines or anti‑bot strategies I can expand. Ready to deliver Phase 1 as your go/no‑go milestone.
$1,500 USD in 10 days
4.6
4.6

Hi there, With a multi-layered profile spanning Branding & Design, Software Development & DevOps, and Data Engineering, I'm uniquely positioned to be your trusted partner in crafting the kind of multi-source real estate data pipeline you seek. Name any significant technology stack for this project - Apify, AWS PostGreSQL, TypeScript + Crawlee (or Python + Scrapy), Playwright + Apify residential proxy among others - and I can navigate it seamlessly for you within one harmonious ecosystem to provide unified, fresh property data from eight Venezuelan portals on a weekly basis. Drawing information diligently from MercadoLibre VE's server-rendered + anti-bot structure to Rivinotinto's classic PHP site with zero anti-bot measures in place, I'm well versed in adapting to the varying technical nuances each site requires. Your valuation fields and the unified schema would be my utmost priority, ensuring each property is tagged lucidly with its respective portal and geographical details.
$1,125 USD in 3 days
4.5
4.5

Hi There! I specialize in large-scale geospatial web scraping and data pipeline engineering with 9+ years of experience building Apify-based crawlers, AWS PostgreSQL pipelines, and anti-bot resilient architectures. Here’s how I can help: 1. Build modular Apify Actor with per-portal adapters and unified schema 2. Implement API-first scraping strategy with HTML + Playwright fallback logic 3. Handle MercadoLibre anti-bot using residential proxy + incremental crawling 4. Set up AWS Lambda → PostgreSQL pipeline with deduplication and price history tracking I focus on scalable, cost-efficient scraping systems with clean normalization and maintainable architecture. How often do you expect schema or portal structure changes, and should the system auto-heal minor HTML shifts?
$1,125 USD in 7 days
4.0
4.0

⭐⭐⭐⭐⭐ ✅Hi there, hope you are doing well! I recently developed a multi-source real estate data pipeline that scraped multiple property portals, normalized diverse data into a unified schema, and stored the results in PostgreSQL, making data aggregation and market intelligence seamless. The key to success in this project lies in designing a robust, modular scraper architecture that respects site-specific anti-bot measures and organizes data flow smoothly from scraping to database insertion. Approach: ⭕ Build a highly modular Apify Actor in TypeScript with site-specific adapters using Crawlee; ⭕ Prioritize API first scraping, fallback to HTML parsing, and use headless browser only where anti-bot is strong; ⭕ Implement incremental crawling for efficiency to minimize proxy costs, especially for MercadoLibre and PERAIG; ⭕ Normalize all listings with office and geographic state tagging per your schema; ⭕ Integrate the pipeline via webhooks to AWS Lambda for insertion/updating in your PostgreSQL database including price history; ⭕ Deliver full DDL, schema migrations, logging, Slack alerts, and comprehensive documentation. ❓May I confirm if you have access to residential proxies suitable for MercadoLibre scraping? ❓Could you share the final URLs and state/municipality mappings for normalization? ❓Do you require real-time monitoring dashboards beyond Slack alerts? With my deep experience in Apify, AWS Lambda, and building complex multi-source data pipelines, I am confident in de
$1,200 USD in 7 days
3.8
3.8

Hello, I have reviewed your project description carefully. Here’s my structured approach: 1. MercadoLibre (~63k listings) Use Playwright with residential proxies. Block images/CSS/fonts to reduce load. Incremental detail fetches; re-check prices via list pages. 2. MLS Caracas (Wasi-based) Use API directly for speed and reliability; fallback to HTML parsing only if needed. 3. RE/MAX (map-based search) Generate lat/long grid to cover the map. Extract listing URLs and card data from list pages. Fetch detail pages for agent, agency, and property attributes. Stack & Experience Node.js + TypeScript + Crawlee, Apify Actors/Datasets/Scheduler, Playwright AWS Lambda → PostgreSQL integration with normalization, price history, and incremental updates Similar multi-source pipelines with adapter isolation and structured schema delivery I can deliver Phase 1 (MercadoLibre + base pipeline + normalization + Lambda → PostgreSQL integration) as the first milestone and then add the remaining adapters and monitoring progressively. Looking forward to your response, Chris
$1,500 USD in 7 days
3.7
3.7

This is exactly the kind of large-scale data engineering and scraping project I enjoy working on. I understand that the success of this project depends not only on extracting data accurately, but also on building a cost-efficient, maintainable architecture that can reliably process tens of thousands of listings while following the API → HTML → Headless hierarchy. Deliverables: * Apify Actor with adapter-based architecture * PostgreSQL schema, migrations, and indexing strategy * Apify → AWS Lambda → PostgreSQL integration * MercadoLibre adapter with anti-bot handling * Isolated adapters for all supported portals * Change detection and price history tracking * Weekly scheduled execution * CloudWatch logging and Slack alerts * Infrastructure-as-Code and deployment documentation * Handover and knowledge transfer session Phase Estimates: * Phase 1: $1,200 * Phase 2: $1,000 * Phase 3: $800 * Phase 4: $500 Maintenance: * $20/hour or fixed monthly support agreement Approach: * MercadoLibre: Residential proxies, request fingerprint rotation, blocked assets (images/CSS/fonts), incremental crawling, and list-page-first strategy to minimize proxy costs. * MLS Caracas (Wasi): I would prioritize reverse-engineering the underlying API through network/XHR inspection and consume structured endpoints directly whenever possible. * RE/MAX: Analyze map request patterns, extract geographic parameters, automate pagination coverage, and enrich detail pages through a second-stage crawler Malik
$1,200 USD in 15 days
3.8
3.8

Hello, I have experienced in Amazon scraping using apify and Once you contact with me, I can show my experience as json file. I will use Python , Apify , Postgresql as you indicated, and there is no problem for me. I am sure I can complete in short time. A key aspect of your project necessitates selecting the most efficient method hierarchy per site - aligning cost optimization with maintaining stability. This is where my experience really shines, having successfully implemented similar strategies in prior projects. Additionally, my strong background in Web Scraping will ensure proficient and accurate extraction of data from various sources such as server-rendered (Rivinotinto, RE/MAX), classic PHP (MLS Caracas) and single-page apps like TuHome24. Best Regards
$900 USD in 10 days
4.0
4.0

Miami, United States
Payment method verified
Member since Nov 28, 2012
$30-250 USD
$250-750 USD
$30-250 USD
$30-250 USD
$30-250 USD
$30-250 USD
$250-750 AUD
$10-30 AUD
$30-250 USD
$750-1500 USD
€250-750 EUR
$10-30 USD
$2-8 USD / hour
$1500-3000 USD
₹600-1500 INR
$10-30 CAD
$10-30 USD
₹1500-12500 INR
$250-750 USD
$30-250 USD
₹1500-12500 INR
$10-50 USD
$15-25 USD / hour
$250-750 USD