mini project HDFS-HIVE

Job Description:

The rendering will be in the form of a report with the list of commands and screenshots of commands, results and NiFi development + export of the nfi template

Work to do:


In HDFS, create in HDFS command lines (hdfs dfs -??????) the following tree structure /data/common/raw/DATABASE_M1/ETUDIANT_M1

In HDFS command lines, Create a file [login to view URL] in this directory (having 3 columns firstName, lastName,email, with your data)

Display HDFS command line contents of directory

Display the HDFS command line contents of the file


Create a database DATABASE_M1

With HQL, create a database DATABASE_M2

With HQL, create a hive table ETUDIANT_M1 in the DATABASE_M1 database pointing to the data/common/raw/DATABASE_M1/ETUDIANT_M1 directory

With HQL, Display the contents of the STUDENT_M1 table

With HQL, Create an ETUDIANT_M1_PART table in the DATABASE_M1 database partitioned on the DateRecep field (in year month, day, hour, minute format: YYYYMMDDHHmm) and pointing to the /common/raw/DATABASE_M1/ETUDIANT_M1_PART directory

Create an external table STUDENT_M2 in the DATABASE_M2 database


Expose a NIFI API to receive external file data (use the 2 HandleHttpRequest and HandleHttpResponse)

Send, 10 times, the data [login to view URL] (attached to course) to nifi api.

Convert data received with CSV format to avro format

Drop the data in the directory (use the processesor putHdfs) HDFS /common/raw/DATABASE_M1/ETUDIANT_M1_PART/DateRecep=202210ddHHmm (this value must be generated dynamically by nifi, (use an attribute of the flowfile with a date value in the requested format ex: Variable_DateRecep with value DateRecep=${now():format('yyyyMMddHHmm')}

Do a select on the table, what do you notice?

Run the following sql command Msck repair table DATABASE_M1.ETUDIANT_M1_PART;

Copy the data (via an hql query executed by NIFI) from the ETUDIANT_M1_PART table to the ETUDIANT_M2 table so as to keep only the latest version of the file sent (used the OVERWRITE keyword and in the where clause of the select use the value of the last score.

Beceriler: Hadoop, Big Data Sales, Spark, Apache Kafka, Hive

Müşteri Hakkında:
( 1 değerlendirme ) Massy, France

Proje NO: #36234284

Bu iş için 7 freelancer ortalamada €171 teklif veriyor


Greetings I'm a data engineer with extensive experience in hadoop hdfs , Hive, and big data solutions. I'm confident that I can deliver high-quality work within your budget and timeframe. Let's discuss further. Mounir

€200 EUR in 4 gün içinde
(1 Yorum)

Hi, how are you? I go through the description and read it carefully, I know exactly what you are looking for. I have 5+ years’ experience in these skills Big Data Sales, Apache Kafka, Hadoop, Spark and Hive. I have so Daha Fazla

€250 EUR in 5 gün içinde
(0 Değerlendirme)

Hi, I have already worked a project very similar to yours and I believe I can make this work in 7 days maximum due to my knowledge of the big data ecosystem. We can talk in details if this interests you.

€120 EUR in 7 gün içinde
(0 Değerlendirme)

I am a 6+ years experienced data engineer. I can do the development for you in 1 week with professionalism.

€100 EUR in 7 gün içinde
(0 Değerlendirme)

Hi, I can do this effectively as i have expertise in Hadoop, hive , nifi... Plz visit my profile for more info. Thanks

€140 EUR in 7 gün içinde
(1 Yorum)

Hi there , I have been working in big data Hadoop projects &. I excel at Hadoop , Hive . Let me know if I can help you on this . Thanks

€140 EUR in 9 gün içinde
(0 Değerlendirme)