(a) Assume you are using the scores a user rated for all these movies to represent users with unrated movies set to 0. Thus, the number of features is equal to the total number of movies (17,770) in the training set. What similarity (or distance) measure would you use to cluster the users? Please select among “Simple Matching Coefficient”, “Jaccard”, “Cosine”, and “Euclidean distance”. Why do you select this measure? (Please limit your answer to 30 words).
(h) The altitude above sea level recorded in meters.
(h) The altitude above sea level recorded in meters.
1. The plot below displays a set of 2D data points that are…
1. The plot below displays a set of 2D data points that are uniformly distributed. Suppose you want to use K-Means to cluster these points into three groups. Please select the final cluster boundaries that you anticipate K-Means may produce.
(d) The number of rooms available in a hotel.
(d) The number of rooms available in a hotel.
(k) The amount of time you take to finish this exam.
(k) The amount of time you take to finish this exam.
(b) Based on this training data set, assume you would like t…
(b) Based on this training data set, assume you would like to predict a user’s rating of an unrated movie, i.e., classify the user into 5 categories (rating 1 to 5), given the user’s rating to other similar movies rated by the same user. Between “KNN”, “Neural network”, and “Naïve Bayes”, which one would you use and why? (Please limit your answer to 30 words).
(a) Coat check number. (When you attend an event, you can of…
(a) Coat check number. (When you attend an event, you can often give your coat to someone who, in turn, gives you a number that you can use to claim your coat when you leave.)
(c) The KNN algorithm relies on feature scaling (e.g., norma…
(c) The KNN algorithm relies on feature scaling (e.g., normalization) to work effectively when using Euclidean distance as a metric.
(b) The year a building was constructed.
(b) The year a building was constructed.
(i) Support Vector Machine can handle high dimensional data…
(i) Support Vector Machine can handle high dimensional data.