The purpose of this project is to have you demonstrate how the data mining methods learnt can be
applied on real world problems. It will further allow you to develop a deeper understanding of different
algorithms by implementing them yourself and provide hands on experience using data mining methods
for real data.
You’re asked to perform the following steps:
2. Select the data: Once you’ve selected the topic and your reply posting is 1st ranking, then you need
to visit UCI Machine Learning repository to select the data set for your topic
([url removed, login to view]). Note that there may be more than one dataset
available for your problem; in that case, you’re free to choose whichever dataset you prefer.
3. Visualize the data: Using several visualization techniques, visualize and review your data (include
these visualizations in your report).
4. Preprocess your data using the techniques described in class. You will probably have to try different
techniques and assess their performance and make a final selection of preprocessing steps.
5. Select at the very least 2-3 different data mining algorithms and apply to your data. Please try as
many different algorithms as possible (whenever problem and data is supporting the use of such
algorithms). Note that grading of the project will take into account the number and variety of the
algorithms implemented for your problem.
6. Analyze your results: Please state the performance of each of the algorithms implemented for your
dataset using common performance measures such as accuracy, recall, F-measure, sensitivity,
loss/error rate, ROC curves etc.
7. Write a report about your implementation and analysis. Please include results from every step
including, preprocessing (i.e. what steps are taken, what impact has been obtained, etc.),
visualization (plots, graphs, and comments based on these visualizations), data mining algorithms
(how the settings and parameters are determined, what difficulties are experienced if any,
performance metrics used, the strategies used to prevent overfitting, i.e. training-validation-testing
splits), and your overall algorithm recommendation for the dataset selected.
Reports should not exceed 30 pages (including graphs, code snippets and screenshots), typed single
space with 12 pt. font of Times New Roman or Arial, and must have the following sections: Overview
of the problem (describe the problem and its importance), Dataset Overview (describe the data),
Data Preprocessing, Algorithm Selection, Analysis Results and Comparison, Conclusion.
8. In your submission, please submit both the code (can be included as R file or as a separate
text/word file) as well as the report.
Return your code and the repor
12 freelancers are bidding on average $178 for this job
I am an expert in R programming and check my profile reviews they will say all about my hard work [url removed, login to view] a data scientist I can surely accomplish this task. Relevant Skills and Experience I have done over 250 proj Daha fazlası