Q7: (8 points)After taking the action suggested in the previ…

Q7: (8 points)After taking the action suggested in the previous question, suppose the discount factor is γ = 0.9, the state transfers from s₂ to s₄ after taking action aₜ, and the reward r is 0.6. Please update the Q-table and write down the updated Q-table. Note: Only one value in the table needs updating, and you might need the Bellman Equation:

Q2: (6 points) Assuming we aim to build a more advanced reco…

Q2: (6 points) Assuming we aim to build a more advanced recommendation system for an online bookstore using matrix factorization-based methods, similar to the one that won the Netflix prize. Suppose the global mean rating of books is 3.6 stars. Bob, a loyal customer, has rated 400 books, and his average rating is 0.3 stars higher than the global average rating. Meanwhile, Pride and Prejudice is a book in the bookstore that has 200,000 ratings, with an average rating that is 0.5 stars lower than the global average. What would be a baseline estimate of Bob’s rating for Pride and Prejudice? (2 points) Illustrate how you arrived at your answer. (2 points)