Predictive Data Analytics and Classification

₹6000-7500 INR

Sürüyor

İlan edilme:

3 ay önce

₹6000-7500 INR

Teslimde ödenir

Overview of the Task: The original test sheets contain many data sets each with 49 numbers. Each data set is a column. Each of the data sets/columns has 7 out of 49 numbers selected as Process numbers. These are given in bold red. Now, the last column, the rightmost column, is the target data set for prediction. All other columns are data sets to be used for training the model. The project's ultimate objective is to predict the 7 process numbers of that last column/data set using Machine Learning models. We are using as many as 5 different types of ML models to predict these 7 pattern numbers from the target data set which is the last column of each test sheet. During this process of prediction, we have come across certain observations. We had to solve those observations and improve the prediction accuracy by overcoming those observations with methods or approaches to be developed by expert data scientists. This task named “Data analysis and classification” is for that objective. We have predicted the 7-process number of approximately 50 data sets using these 5 ML models at various test sizes. These prediction results are illustrated in the Excel workbook file named: “Comparison of prediction results of 50 data sets”. How to read and understand this Excel workbook is explained below: 1) The workbook has 50 sheets. The leftmost sheet is named 388 and it goes to 438 at the rightmost sheet. Out of these 50 sheets data is now filled up to 431, totalling to 44 data sets. Data of the remaining sheets shall be filled in due course as the data becomes available. 2) The numbers given as the sheet names are the numbers of the data sets. From 388, 438. Each of these numbers is also the name of the target data set, the rightmost column of each test sheet. 3) One data asset can have up to 6 to 7 test sheets. Named 388-1, 3881A, 3881B, 388-2 …. up to 388-5. Each test sheet has a varying number of data sets for training and one target data set. The number of data sets in each test sheet is stated in the Test sheet names. 4) A test sheet name starts with the number of the target variable (or target column) where we have to predict the 7 numbers. 5) Each of the 50 sheets of the workbook has a list of 9 numbers predicted by different ML models. The models used were RF - Random Forest Classifier, SVML - SVM Linear Classifier kernel, SVMR - SVM RBF Classifier kernel, SVMP – SVM poly classifier kernel and NB - Naive Bayes Classifier. 6) The actual 7 values or pattern numbers are given in the coloured cells in the top left of each sheet. Wherever these numbers have occurred in prediction results are also coloured with respective colours. 7) You may also notice something like - 388-1, 388-2, 388-3, 388-4, etc. These are different variations of test sheets of the dataset numbered 388 in each of these 5 to 7 various test sheets 388 is the target column. So, we make predictions using each of these test sheets of various sizes. ? Finally, we noticed getting better results by changing the test sizes during the test-train split. So, we have also tested each of the models in different test sizes - 0.2, 0.3, 0.4, 0.5, 0.6. These test size values are given in brackets against each test sheet name. 9) At the top left of each you can also notice 'Result type'. This describes a special data manipulation criterion. 'No column removal' - No columns are removed from the test sheet, 'Two column removal' - First two columns are removed from the test sheet, 'Four column removal' - First four columns the first four training data sets are removed from the test sheet etc. This resulted in increased prediction accuracy a little bit, so please be on the lookout for this variable. The Task: A. You have to first look through various predictions of each sheet, there are 150 predictions in each sheet, and count, list out/tabulate the facts available there such as: a) How many of the pattern numbers have occurred in each type of prediction? b) Which type of prediction has the highest number of correct pattern numbers? c) Which type of prediction has a consistent result? This means having a similar number of correct numbers repeatedly. d) Variations in Dataset: Explore the variations of the same dataset (e.g., 388-1, 388-2) and note any significant differences in prediction accuracy. e) Effect of Test Sizes: Investigate the impact of different test sizes (0.2, 0.3, 0.4, 0.5, 0.6) on prediction accuracy for each model. f) Influence of 'Result Type': Assess how different 'Result Types' affect the accuracy, especially whether column removal enhances or hinders the predictions. And so on…. All such observations/facts available there will help us determine which type of mode and at what test size value has the best performance. B. Analyse each test sheet in detail using various metrics used in data science to determine what are the characteristics of a test sheet or the target data set that gives the best prediction result. a) Prediction Accuracy: Calculate the overall accuracy of predictions for each test sheet. This involves assessing the ratio of correct predictions to the total number of predictions. b) Precision, Recall, and F1 Score: Break down the performance using precision, recall, and F1 score metrics. Precision measures the accuracy of positive predictions, recall assesses the ability to capture all positive instances, and F1 score combines both metrics. c) Feature Importance: If applicable, analyze the importance of features in the prediction. This is particularly relevant if certain columns or variables significantly influence the model's performance. You may use the SHAP graphs generated using interpretML to achieve this. d) Hyperparameter Tuning: Explore the impact of hyperparameter tuning on model performance. Assess how adjustments to parameters influence the predictive accuracy. C. Analyse each dataset (each data set is the same as each column and has 49 numbers) in detail using various metrics that can be derived from a data set without taking into account or considering the prediction results. a) Descriptive Statistics: Compute basic descriptive statistics such as mean, median, standard deviation, minimum, and maximum values. This provides an initial understanding of the central tendency and variability of the dataset. b) Data Distribution: Visualize the distribution of the dataset using histograms, box plots, or kernel density plots. This helps identify any skewness, outliers, or patterns within the data. The objective of this analysis and expected results: After this detailed study and analysis, we will get the following ability/knowledge I) Be able to classify or categorise the Test Sheets into categories or classes like: a) Most friendly with SVM linear with ----test size. b) Needs removal or addition of data set to get various metric values to satisfy getting better prediction results. c) …… d) ….. II) Be able to classify or categorise individual data sets into categories or classes like: a) Most friendly with SVM linear with ----test size. b) Needs removal or addition of data set to get various metric values to satisfy getting better prediction results. c) …. d) …… III) Be able to remove or add training data sets from a test sheet to get the highest possible number of correct predictions per different types of prediction models and test size. IV) Any other corrective actions to help us get high prediction accuracy Plan of Action In order to ensure the precise predictions of these models we have to compute a few metrics. These metrics generally depict the efficiency of the model. The list of these metrics is mentioned below along with details: - Accuracy: Proportion for correctly classified occurrences as defined in the pattern set. You have to compute the counts which are matching to the pattern sets and compute the proportions. Similarly, it will give us the error rate as well. We know the threshold and use it to interpret the results. Confusion Matrix: Accuracy alone is not enough to conclude the efficiency of the model. Conduct the in-depth analysis using underlying information. This matrix will give the True Negatives and True Positives. False Negative and False Positives. These measures will help to understand what are the variations and whether we can rely on a particular model or not. Sensitivity and Specificity: These measures will give us an overview of how many true positives (Predictions) are identified as pattern numbers. Similarly, how many numbers are identified as non-pattern numbers?

Machine Learning (ML)

Proje No: 37845917

Proje hakkında

6 teklif

Uzaktan proje

Son aktiviteden bu yana geçen zaman 3 ay önce

Biraz para mı kazanmak istiyorsunuz?

E-posta adresi

Freelancer'da teklif vermenin faydaları

Bütçenizi ve zaman çerçevenizi belirleyin

Çalışmanız için ödeme alın

Teklifinizin ana hatlarını belirleyin

Kaydolmak ve işlere teklif vermek ücretsizdir

Seçilen:

@sajidjasi

As an experienced data scientist with a strong background in analytics and statistical analysis, I believe I am the ideal candidate for your project. I have a thorough understanding of Machine Learning models, having successfully used them throughout my career to predict and analyze complex data sets. My proficiency in tools such as Excel, R and Python will prove valuable when analyzing the large pool of data available in your project. Another strength I bring to the table is my emphasis on accuracy and detail orientation. Your project requires a close examination of pattern numbers in relation to different predictors, test sizes and column manipulations; something that I have significant experience with. My ability to identify patterns, classify information and create structured tabulations will help you gain better insights from the prediction results stored in your wealth of Excel sheets. In addition to these quantitative skills, my PhD training in Econometrics has honed my sense of rigor and discipline needed for intricate data analysis tasks like yours. While exploring the workbook file - 'Comparison of prediction results of 50 data sets', I can provide not only precise counts but also a sophisticated analysis of these facts which will be crucial for improving the accuracy of your ML models in future predictions.

₹7.000 INR 7 gün içinde

4,6

(27 değerlendirme)

4,7

6 freelancer bu proje için ortalama ₹6.846 INR teklif veriyor

@ZIED

With over 10 years of experience in data science and machine learning, I, Dr. Zied B., am well-equipped to tackle the complex task outlined by Rajat K. I have a proven track record of successfully completing intricate data analysis and classification projects, utilizing various machine learning models to predict patterns with high accuracy. My expertise spans from precision and recall metrics to hyperparameter tuning and feature importance analysis, ensuring thorough and comprehensive evaluation of test sheets and datasets. I am committed to delivering actionable insights and recommendations that will optimize model performance and enhance prediction accuracy. My meticulous attention to detail and strategic approach will enable us to identify the best-performing models and test sizes for Rajat K.'s project, ultimately leading to high prediction accuracy and valuable outcomes.

₹6.828,33 INR 3 gün içinde

4,9

(88 değerlendirme)

5,5

@Joz254

Hello, I think I can help with this project. However, I would love to see the datasets to better understand the instructions. Let me know if it's possible. Regards, George.

₹6.750 INR 7 gün içinde

4,9

(11 değerlendirme)

4,0

@Monishes

My name is MONISH and I am a seasoned professional in the field of artificial intelligence (AI) and machine learning (ML) with extensive experience in data science. With regards to your project, my expertise is a perfect match for the complex task at hand. I am highly skilled in developing AI algorithms and ML models such as Random Forest Classifier, SVM Linear Classifier, and Naive Bayes Classifier, among others, which you mentioned as being used in the project. Moreover, my deep learning proficiency utilizing techniques such as CNNs and RNNs aligns precisely with your need for predictive analysis. I have a lot of experience in time series analysis, time series classification, time series prediction, and time series anomaly detection. I've completed various projects like a Variational AutoEncoder-based Univariate Time Series Outlier Detection Algorithm and Open Set time series Classification Algorithms. In-depth data analysis is one of my core competencies. I will meticulously quantify and dissect facts from each of the sheets in your Excel workbook. My strong command over data manipulation and exploratory data analysis will ensure that no relevant information goes unnoticed and that patterns are correctly identified. Additionally, my experiences with test-train splitting during ML model creation align perfectly with your observation of improved results through changing test sizes.

₹6.750 INR 7 gün içinde

0,0

(0 değerlendirme)

0,0

@kumarrajat273

As a data analyst with a keen interest in predictive analytics and a solid record of creating insightful dashboards, I am confident that my skill set and experience make me the ideal candidate for this project. Over the years, I have demonstrated my expertise in tools like Python (Pandas, Matplotlib, Seaborn, NumPy) and R for statistical analysis, which aligns perfectly with your project's requirement. My robust knowledge of machine learning models includes the RF (Random Forest Classifier), SVML (SVM Linear Classifier kernel), SVMR (SVM RBF Classifier kernel), SVMP (SVM poly classifier kernel), and NB (Naive Bayes Classifier) models you utilized in your test. What sets me apart is my approach to data manipulation, an essential aspect of this task. I notice the crucial effect of test size variation on prediction accuracy during test-train splits. To optimize it further, utilizing different variations of test sheets and applying suitable criteria for data manipulation like 'No column removal,' 'Two column removal,' 'Four column removal,' etc., as mentioned in your brief becomes critical. With my qualitative eye for effective data management, I will diligently evaluate these manipulations and provide you with comprehensive tabulation and analysis. Lastly, I am committed to not just collecting results but turning them into actionable insights.

₹6.750 INR 7 gün içinde

5,0

(1 değerlendirme)

0,0

@parasaggarwal19

As an experienced data scientist, I propose to analyze and classify the patterns within the test sheets to enhance prediction accuracy. By leveraging advanced machine learning techniques, I'll address observed challenges and refine models for accurate process number prediction. My approach involves meticulous data preprocessing, feature engineering, and model optimization. Additionally, I'll implement ensemble methods and deep learning architectures if required to maximize predictive performance. With a comprehensive understanding of the dataset and adept problem-solving skills, we aim to deliver actionable insights and superior predictions.

₹6.999 INR 7 gün içinde