
Completed
Posted
Paid on delivery
To build a pipeline where we obtain data from a table in hadoop server and do some quality checks before updating that data to a postgress table. After that we need to filter that postgress data and do an upsert command and update that particular data into an s3 bucket. The data pipeline involves multiple data sources. The pipeline should include data validation to ensure accuracy and consistency. Ensure data quality checks such as checking the count of rows, comparing the data before and after, and collecting the new data inserted before updating the data to the Postgres table." Please use Apache Airflow for orchestration. The pipeline should run daily. Ensure the pipeline includes advanced data validation like integrity checks and statistical analysis.
Project ID: 38860893
3 proposals
Remote project
Active 1 yr ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

With a strong background in AI, automation, and extensive experience in Python programming, I am well-equipped to handle your data engineering needs. I have developed robust projects utilizing advanced technologies and frameworks including the ones you mentioned like Apache Airflow and Pyspark. My expertise qualifies me not only to build a data pipeline between Hadoop and Postgres within your required daily timeframe but also to implement the necessary quality checks to ensure accuracy and consistency of your data. In addition, my proficiency in designing powerful software architectures that are scalable and maintainable will be a great asset in designing an efficient ETL process that incorporates advanced data validations such as statistical analysis and integrity checks. Furthermore, my familiarity with cloud platforms like AWS will be instrumental in storing your data securely and ensuring high-performance throughout the process. Above all, my dedication to continuous learning and keeping up-to-date with the latest industry trends means that you'll be working with someone who's not just qualified for the job but passionate about delivering innovative solutions. I look forward to discussing further how we can maximize the potential of your data pipeline project. Let's ensure your data operations are reliable, efficient, and future-proof!
₹3,500 INR in 2 days
0.0
0.0
3 freelancers are bidding on average ₹2,517 INR for this job

I develop web applications with a focus on ERP systems, using tools like Frappe platform and ERPNext. I specialize in data integration, ensuring smooth connectivity and automation across different services. Additionally, I build chatbots that enhance user interaction and support. My expertise also extends to setting up and customizing advanced chat systems for efficient communication.
₹1,050 INR in 7 days
3.9
3.9

Hello, I'm professional data engineer with strong experience in Pyspark. I've used this technology to process varous types of big data. I'm sure I can help you doing this task efficiently, and I guarantee a high quality jobs.
₹3,000 INR in 2 days
0.0
0.0

Bengaluru, India
Payment method verified
Member since Mar 14, 2021
$30-250 USD
$25-50 USD / hour
₹1500-12500 INR
₹12500-37500 INR
$25-50 USD / hour
$250-750 USD
₹50000-70000 INR
₹600-7000 INR
₹12500-37500 INR
$250-750 USD
₹12500-37500 INR
₹37500-75000 INR
₹1500-12500 INR
₹750-1250 INR / hour
£1500-3000 GBP
$5-30 USD / hour
₹1000-1500 INR
$8-15 USD / hour
₹37500-75000 INR
min €36 EUR / hour