Kelley is a participant in his employer-sponsored ROTH 401(k…

Kelley is a participant in his employer-sponsored ROTH 401(k) plan during part of the year. Later in the year, Kelley becomes a participant in his new employer’s 403(b) plan. Kelley can defer an aggregate maximum of $23,000 (2024) (plus $7,500 if age 50 and over) between 401(k) plan and the 403(b) plan. 

A non-safe harbor 401(k) plan allows plan participants the o…

A non-safe harbor 401(k) plan allows plan participants the opportunity to defer taxation on a portion of regular salary simply by electing to have such amounts contributed to the plan instead of receiving them in cash. Which of the following statements are rules that apply to 401(k) salary deferrals?1. Salary deferral into the 401(k) plan is limited to $23,500 for individuals younger than 50 for 2025.2. A non-discrimination test called the actual deferral percentage test applies to salary deferral amounts.

The dataset data2.csv provides information on predicting whe…

The dataset data2.csv provides information on predicting whether a patient is likely to get a stroke based on parameters such as gender, age, various diseases, and smoking status. Each row in the data provides relevant information about each patient. Here is the description of the columns:  id: unique identifier gender: Male (0), Female (1) or Other (2) age: age of the patient hypertension: 0 if the patient doesn’t have hypertension, 1 if the patient has hypertension heart_disease: 0 if the patient doesn’t have any heart diseases, 1 if the patient has a heart disease ever_married: No (0), Yes (1) work_type: Private (0), Self-employed (1), children (2), Govt_job (3), Never_worked (4) Residence_type: Rural (0) or Urban (1) avg_glucose_level: average glucose level in blood bmi: body mass index smoking_status: never smoked (0), smokes (1), formerly smoked (2), Unknown* (3) stroke: 1 if the patient had a stroke or 0 if not Note: “Unknown” in smoking_status means that the information is unavailable for this patient You are going to handle the missing values first. Drop any rows that contain missing values and then drop the column id.(10 points) Build a Logistic Regression model to predict the stroke status and use all the columns except “Stroke” as independent variables. Split the data into Train and Test sets with 80% of data as Train set. Print the following values: Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, accuracy score, and confusion matrix. (30 points) (note: you may get a warning message regarding the number of iterations, disregard the warning) Repeat part B using the support vector classifier and compare the models. (10 points) data2.csv

The data file for this question is a diamond dataset availab…

The data file for this question is a diamond dataset available from the Seaborn website. To load this data, run the following: data = sns.load_dataset(‘diamonds’) (please note that the library has to be imported first) Create test and training datasets using the carat, table, and depth columns as the independent variables and the price as the dependent variable. (The x, y, and z columns contain information that’s related to the table and depth columns, so it’s not necessary to use those columns.) The test dataset should consist of 30% of the total dataset, and you should specify a value for the random_state parameter. ( 10 pts) Create and fit a multiple linear regression model. ( 5 pts) Find the MSE accuracy of the model with the test dataset. ( 5 pts) Create a DataFrame that shows the actual price and the predicted price. Then, display the first five rows of data to see how close the predicted prices are. (use the test set only!) (5 pts) Calculate the residuals (residual is the difference between the actual y and the predicted y) and store the results in a new column in the DataFrame you created in the previous question. Then, display the first five rows of the dataframe. (5 pts) Plot a density plot of the residuals and comment on the shape of the distribution. (5 pts) Repeat parts B and C and fit a quadratic polynomial regression. Which model is more accurate? (15 pts)