Hadoop Spark/Scala Project


A) Using Hadoop hdfs & Spark-scala programming

Source dataset: [url removed, login to view]

download data for 1999,2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008

1) Download and combine all data for the years specified about

2) Data Cleanup: Find and remove /filter out outliers & bad data

3) Perform statistics analyis on the data: counts /averages /sums / min /max

3) Using spark/scala programming on the entire dataset, what percent (%) is

a) on-time flight

b) cancelled flight

c) Delays flights

d)TOP 5 Causes of delays

e) Most causes of flight delays

f) Airlines with the most delays to a destination

g) Airline with the most cancellations

h) Airline with the most on-time

i) Flight on-time / delays and cancellation national averages

J) Perform some visualization in Tableau (Send me output data file,I will do visualisation myself)

K) All of the above Code in a separate PDF file

B) Create 10-15 pages (in word) to include the following topics:

1) Data source

2) Description the data and its schema

3) Data pre-processing required (parsing, filtering, etc.)

4) Any bad data issues encountered

5) Describe Your Spark algorithm

6) Describe any other ecosystem or additional tools used

7) Describe the output

8) How did you verify that your output is correct?

9) discuss the Performance/scale characteristics

10) what would you have done differently if you did this again?

11) Draw a conclusions from this excercise

Please NOTE: This must be your original work. Someone else code cannot be copied from online and used in this project. Doing so will cause you an F grade in this course

Deliverable Timeline:

1) Code in separate document -- Deliver by NOV 25

2) Documentation (10-15 pages in word) -- Deliver by NOV 27

3) Output dataset file --- Deliver by NOV 30

Deadline: NOV 30 for all of the above

NB: Your personal hadoop cluster or I can provide access to cloud based hadoop cluster with data files already download onto HDFS folder

Beceriler: Big Data Sales, Hadoop, Map Reduce, Ölçek, Spark

Daha fazlasını gör: hadoop,spark,scala, scala project bid, case project report human resource issues, spark intellij maven, spark scala intellij tutorial, intellij spark submit, intellij spark setup, spark intellij sbt, intellij spark java, intellij add spark library, spark intellij tutorial

İşveren Hakkında:
( 29 değerlendirme ) North saint Paul, United States

Proje NO: #15652432



I am a data scientist and have experience with Big data Technologies like Spark and Hadoop. I have previously worked with this dataset and can complete all your given task before date specified. Relevant Skills and Daha Fazla

%selectedBids___i_period_sub_7% gün içinde 135%project_currencyDetails_sign_sub_9% %project_currencyDetails_code_sub_10%
(11 Değerlendirme)

Bu iş için 6 freelancer ortalamada $168 teklif veriyor


Hi, I work in Bigdata technologies using Java, Scala and python for implementation. Can we further talk? Here is my Guru profile: [login to view URL] Thanks

in %bids___i_period_sub_35% gün içinde188%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(6 Değerlendirme)

Hi, I am java expert and have experience on Big data, so this kind of data processing will be done perfectly. Hope to discuss with you. regards, Relevant Skills and Experience Java Proposed Milestones $133 USD - Init

in %bids___i_period_sub_35% gün içinde133%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(2 Değerlendirme)

Have 10 years of IT experience with more than 3.5 years of experience in hadoop technologies like hive,pig,spark,sqoop,map reduce and [login to view URL] have very good experience in Java,scala,Python and shell scripting. Work Daha Fazla

in %bids___i_period_sub_35% gün içinde250%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(3 Değerlendirme)

I have worked with spark and made recommendation system earlier so I will be able to fulfill your task

in %bids___i_period_sub_35% gün içinde55%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)
in %bids___i_period_sub_35% gün içinde244%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)