The dataset data2.csv provides information on predicting whether a patient is likely to get a stroke based on parameters such as gender, age, various diseases, and smoking status. Each row in the data provides relevant information about each patient. Here is the description of the columns: id: unique identifier gender: Male (0), Female (1) or Other (2) age: age of the patient hypertension: 0 if the patient doesn’t have hypertension, 1 if the patient has hypertension heart_disease: 0 if the patient doesn’t have any heart diseases, 1 if the patient has a heart disease ever_married: No (0), Yes (1) work_type: Private (0), Self-employed (1), children (2), Govt_job (3), Never_worked (4) Residence_type: Rural (0) or Urban (1) avg_glucose_level: average glucose level in blood bmi: body mass index smoking_status: never smoked (0), smokes (1), formerly smoked (2), Unknown* (3) stroke: 1 if the patient had a stroke or 0 if not Note: “Unknown” in smoking_status means that the information is unavailable for this patient You are going to handle the missing values first. Drop any rows that contain missing values and then drop the column id.(10 points) Build a Logistic Regression model to predict the stroke status and use all the columns except “Stroke” as independent variables. Split the data into Train and Test sets with 80% of data as Train set. Print the following values: Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, accuracy score, and confusion matrix. (30 points) (note: you may get a warning message regarding the number of iterations, disregard the warning) Repeat part B using the support vector classifier and compare the models. (10 points) data2.csv
Solutions to be measured with a spectrophotometer must adher…
Solutions to be measured with a spectrophotometer must adhere to the principle of __________ law.
The data file for this question is a diamond dataset availab…
The data file for this question is a diamond dataset available from the Seaborn website. To load this data, run the following: data = sns.load_dataset(‘diamonds’) (please note that the library has to be imported first) Create test and training datasets using the carat, table, and depth columns as the independent variables and the price as the dependent variable. (The x, y, and z columns contain information that’s related to the table and depth columns, so it’s not necessary to use those columns.) The test dataset should consist of 30% of the total dataset, and you should specify a value for the random_state parameter. ( 10 pts) Create and fit a multiple linear regression model. ( 5 pts) Find the MSE accuracy of the model with the test dataset. ( 5 pts) Create a DataFrame that shows the actual price and the predicted price. Then, display the first five rows of data to see how close the predicted prices are. (use the test set only!) (5 pts) Calculate the residuals (residual is the difference between the actual y and the predicted y) and store the results in a new column in the DataFrame you created in the previous question. Then, display the first five rows of the dataframe. (5 pts) Plot a density plot of the residuals and comment on the shape of the distribution. (5 pts) Repeat parts B and C and fit a quadratic polynomial regression. Which model is more accurate? (15 pts)
What can cause light scattering when a sample is analyzed?
What can cause light scattering when a sample is analyzed?
Which enzyme analysis can be used to evaluate muscular damag…
Which enzyme analysis can be used to evaluate muscular damage?
Which term represents the mathematical manipulation that des…
Which term represents the mathematical manipulation that describes the excretion of specific electrolytes relative to the glomerular filtration rate?
For which of the following species would bile acid testing n…
For which of the following species would bile acid testing not be useful?
Which term refers to the fluid portion of the blood that con…
Which term refers to the fluid portion of the blood that contains fibrinogen but no cells?
In the study conducted to observe paramedics’ ability to man…
In the study conducted to observe paramedics’ ability to manage difficult situations, what did researchers find about more experienced paramedics relative to less experienced paramedics?
Which of the following components of the nucleus has mass bu…
Which of the following components of the nucleus has mass but no electrical charge?