(a) Assume you are using the scores a user rated for all the…

(a) Assume you are using the scores a user rated for all these movies to represent users with unrated movies set to 0. Thus, the number of features is equal to the total number of movies (17,770) in the training set. What similarity (or distance) measure would you use to cluster the users? Please select among “Simple Matching Coefficient”, “Jaccard”, “Cosine”, and “Euclidean distance”. Why do you select this measure? (Please limit your answer to 30 words).

(b) Based on this training data set, assume you would like t…

(b) Based on this training data set, assume you would like to predict a user’s rating of an unrated movie, i.e., classify the user into 5 categories (rating 1 to 5), given the user’s rating to other similar movies rated by the same user. Between “KNN”, “Neural network”, and “Naïve Bayes”, which one would you use and why? (Please limit your answer to 30 words).