This is a student project for implementing a noisy version of random decision trees using spark (Mlib in spark) or Hadoop Map-reduce(using Mahout machine learning algorithm).
In Map-reduce version: A cluster based Hadoop needs to be installed in virtual machines and map-reduce should be configured. A big sample file needs to be divided in to smaller chunks and send to each Mapper and on each mapper a decision tree needs to be created based on the data that it gets. In Reduce phase the results of this decision trees needs to be averaged and some noisy version of the results need to be published.
In Spark: the idea is still the same, install Spark on Hadoop clusters and get a big sample file, get random data sets from the input file and create decision trees and then average this trees ( like random forest techniques) and issue a noisy version of the results.
I prefer to use Spark and particularly looking for someone who can implement this noisy random decision trees algorithm in Spark using Hadoop.
Having very good understanding of cluster based Hadoop, Spark and Machine learning algorithms specially Decision trees and Random Forest are essential for this project.
7 freelancer bu iş için ortalamada 796$ teklif veriyor
Machine learning scholar, My research area includes Data mining, Machine learning and big data . My past experience in machine learning includes 8+ year research in ML domain.