The dataset data2.csv provides information on predicting whe…

The dataset data2.csv provides information on predicting whether a patient is likely to get a stroke based on parameters such as gender, age, various diseases, and smoking status. Each row in the data provides relevant information about each patient. Here is the description of the columns:  id: unique identifier gender: Male (0), Female (1) or Other (2) age: age of the patient hypertension: 0 if the patient doesn’t have hypertension, 1 if the patient has hypertension heart_disease: 0 if the patient doesn’t have any heart diseases, 1 if the patient has a heart disease ever_married: No (0), Yes (1) work_type: Private (0), Self-employed (1), children (2), Govt_job (3), Never_worked (4) Residence_type: Rural (0) or Urban (1) avg_glucose_level: average glucose level in blood bmi: body mass index smoking_status: never smoked (0), smokes (1), formerly smoked (2), Unknown* (3) stroke: 1 if the patient had a stroke or 0 if not Note: “Unknown” in smoking_status means that the information is unavailable for this patient You are going to handle the missing values first. Drop any rows that contain missing values and then drop the column id.(10 points) Build a Logistic Regression model to predict the stroke status and use all the columns except “Stroke” as independent variables. Split the data into Train and Test sets with 80% of data as Train set. Print the following values: Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, accuracy score, and confusion matrix. (30 points) (note: you may get a warning message regarding the number of iterations, disregard the warning) Repeat part B using the support vector classifier and compare the models. (10 points) data2.csv

The data file for this question is a diamond dataset availab…

The data file for this question is a diamond dataset available from the Seaborn website. To load this data, run the following: data = sns.load_dataset(‘diamonds’) (please note that the library has to be imported first) Create test and training datasets using the carat, table, and depth columns as the independent variables and the price as the dependent variable. (The x, y, and z columns contain information that’s related to the table and depth columns, so it’s not necessary to use those columns.) The test dataset should consist of 30% of the total dataset, and you should specify a value for the random_state parameter. ( 10 pts) Create and fit a multiple linear regression model. ( 5 pts) Find the MSE accuracy of the model with the test dataset. ( 5 pts) Create a DataFrame that shows the actual price and the predicted price. Then, display the first five rows of data to see how close the predicted prices are. (use the test set only!) (5 pts) Calculate the residuals (residual is the difference between the actual y and the predicted y) and store the results in a new column in the DataFrame you created in the previous question. Then, display the first five rows of the dataframe. (5 pts) Plot a density plot of the residuals and comment on the shape of the distribution. (5 pts) Repeat parts B and C and fit a quadratic polynomial regression. Which model is more accurate? (15 pts)