data munging, cleansing, builduing, predictive model, local regressors and clustering

Job Description:

1 Part 1 - Building up a basic predictive model

Load the dataset [login to view URL] into a pandas dataframe and carry out the following tasks. Organise your code bearing in mind robustness and maintainability:

1. Data cleaning and transformation:

If you have a closer look at the dataset, you will see that there are lots of missing values. They need be treated appropriately but in the first instance, we will take an aggressive approach to dealing with them.

• Show the shape of the dataset

• Rename incorrectly formatted column names (e.g. SALE\nPRICE)

• Create list of categorical variables and another for the numerical variables

• For each numerical column, remove the ',' the '$' for the sale price, and then convert them to numeric.

• Convert the 'SALE DATE' to datetime.

• For each categorical variable, remove the spaces, and then replace the empty string '' by NaN.

• Replace the zeros in Prices, Land squares, etc. by NaN

• Show a summary of all missing values as well as the summary statistics


• Drop duplicates if any

• Drop rows with NaN values

• Identify and remove outliers if any

• Show the shape of the resulting dataframe.

• Consider the log of the prices and normalise the data.

2. Data Exploration. Consider the resulting dataframe. This first aggressive cleaning should give a smaller dataset, which you can start by exploring relationships between the various features of the dataset.

• Visualise the prices across neighborhood

• Visualise the prices over time

• Show the scatter matrix plot and the correlation matrix

• Any further plots, which demonstrate your understanding of the data

3. Model building. Consider the resulting dataframe.

• Select the predictors that would have impact in predicting house prices.

• Build up a first linear model with appropriate predictors and evaluate it. Split the data into a training and test sets; build up the model; and then show a histogram of the residuals. Evaluate your model by using a cross-validation procedure.

2 Part 2 - Improved model

This is an open-ended question and you are free to push your problem-solving skills in order to build up a useful model with higher performance.

1. Consider the entire datasets given in this assignment. Develop an improved predictive model that predicts the sales prices of houses. Make sure to validate your model. You should aim for a model with a higher performance while using a maximum of data points. This implies treating missing values differently for example through imputation rather than dropping them.

2. Use the K-Means algorithm to cluster your cleansed dataset and compare the obtained clusters with the distribution found in the data. Justify your clustering and visualise your clusters as appropriate.

3. Build up local regressors based on your clustering and discuss how this clusters-based regression compares to your regression model obtained in Part 2. 1.

Beceriler: Veri İşleme, Büyük Veri, Data Cleansing, Pandas, Predictive Analytics

Müşteri Hakkında:
( 0 değerlendirme ) Tilbury, United Kingdom

Proje NO: #36266248

Bu iş için 25 freelancer ortalamada £121 teklif veriyor


Top 1% in Freelancer.com Hi, Greetings! ✅checked your project details: ✅Completed Time: In project deadline We have worked on 850 + Projects. I have 6 + years of the experience in same kind of projects. I am a Daha Fazla

£180 GBP in 3 gün içinde
(339 Değerlendirme)

Hi there! I have more than 10 years of experience as a Data Base Administrator and Analytics Engineer. I'd love to work together on your project building your excel models. I'm so responsible and kind, I'll always send Daha Fazla

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% GBP
(23 Değerlendirme)

Hello i am engineer in Statistics, have a good experience , i can help asap................. regards

£200 GBP in 7 gün içinde
(15 Değerlendirme)

Hey I am expert in python and I can help you with your project message me to discuss so that we can start working on it

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% GBP
(6 Değerlendirme)

Hi, Dear Employer. I am Al.A. I am a Ph.D. backgrounder and professional Excel and data scraper with over 12+ years of experience. I will provide you high-quality Excel file. I have worked on similar projects of Spre Daha Fazla

£135 GBP in 3 gün içinde
(3 Değerlendirme)

Hi. How about you? I have just read your proposal and I am sure I can complete the project on time. I am an expert in ML/DL who has 10+ years of experiences. Please contact me to discuss about the project in more detai Daha Fazla

£50 GBP in 2 gün içinde
(2 Değerlendirme)

Hi, Iam data analyst and have relaive experience to deal with your project. I can provide a Jupyter notebook book with part one requirements in one day and can complete the project with max 3 days. Please get in touch Daha Fazla

£150 GBP in 7 gün içinde
(8 Değerlendirme)

Hi there, I have read your project details.I can build you a predtive model for your manhattan data set . I will all the data science techniques such as data preprocessing visualizations to build model for you. Message Daha Fazla

£85 GBP in 4 gün içinde
(3 Değerlendirme)

I will do able to do this project because I expert in ms office and its its related all work so I want to give me this project.Thanks.

£20 GBP in 4 gün içinde
(0 Değerlendirme)

Hi, I can help u as i have done several similar jobs related to Data Processing, Big Data, Data Cleansing, Pandas and Predictive Analytics, I have read the details and furthermore discuss about it, plz discuss with me Daha Fazla

£250 GBP in 9 gün içinde
(0 Değerlendirme)

Hi, dear client, I have seen your description, I am an associate degree professional in Data Processing Analysis, data extraction. I am also an expert in python data scraper . Please send a Maine sage to contact me to Daha Fazla

£135 GBP in 7 gün içinde
(0 Değerlendirme)

Hi, I am new on the site but this is my function basically in my job which is required on your project. I am dealing in large data sets and I have finished nearly 70 projects in last two years. Presently, I am working Daha Fazla

£70 GBP in 4 gün içinde
(0 Değerlendirme)

Hi! I am Junior Actuarial Data Scientist. I can help you for this project.I have done many data analysis similar to this project before. I believe that I will analyze this project very well.

£45 GBP in 5 gün içinde
(0 Değerlendirme)

I am thrilled to have the opportunity to offer my expertise in data science to help you achieve your goals. As a skilled data scientist with a passion for solving complex problems, I am confident that I can provide you Daha Fazla

£100 GBP in 3 gün içinde
(0 Değerlendirme)

EXPERIENCED in dealing with data and cleaning part. Actively part of many ML competition to build and tune models. Can train your model in best outcome way. Recently worked on project name uber lyft price prediction of Daha Fazla

£60 GBP in 7 gün içinde
(0 Değerlendirme)

☑️FULL-TIME FREELANCER☑️ I am an expert in any Scraping, Leads, eCommerce Product Uploading, all kind of Data Entry, PDF Form Creation, Web Search Expert who knows the value of time, is very hard-working, and always d Daha Fazla

£135 GBP in 7 gün içinde
(0 Değerlendirme)

Hello, I’m a machine learning engineer and a certified data analyst. I’m a PhD student at the faculty of engineering and I have published papers in the fields of machine learning and computer vision. I have 9+ years Daha Fazla

£100 GBP in 4 gün içinde
(0 Değerlendirme)