The dataset data2.csv provides information on predicting whe…
The dataset data2.csv provides information on predicting whether a patient is likely to get a stroke based on parameters such as gender, age, various diseases, and smoking status. Each row in the data provides relevant information about each patient. Here is the description of the columns: id: unique identifier gender: Male (0), Female (1) or Other (2) age: age of the patient hypertension: 0 if the patient doesn’t have hypertension, 1 if the patient has hypertension heart_disease: 0 if the patient doesn’t have any heart diseases, 1 if the patient has a heart disease ever_married: No (0), Yes (1) work_type: Private (0), Self-employed (1), children (2), Govt_job (3), Never_worked (4) Residence_type: Rural (0) or Urban (1) avg_glucose_level: average glucose level in blood bmi: body mass index smoking_status: never smoked (0), smokes (1), formerly smoked (2), Unknown* (3) stroke: 1 if the patient had a stroke or 0 if not Note: “Unknown” in smoking_status means that the information is unavailable for this patient You are going to handle the missing values first. Drop any rows that contain missing values and then drop the column id.(10 points) Build a Logistic Regression model to predict the stroke status and use all the columns except “Stroke” as independent variables. Split the data into Train and Test sets with 80% of data as Train set. Print the following values: Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, accuracy score, and confusion matrix. (30 points) (note: you may get a warning message regarding the number of iterations, disregard the warning) Repeat part B using the support vector classifier and compare the models. (10 points) data2.csv