PCA is typically used to reduce the number of dimensions of…
PCA is typically used to reduce the number of dimensions of a machine learning problem. For example, we might go from 20 features to just using the top 10 components identified by PCA. Intuitively, this would tell us that we are throwing away some information. But, strangely, when the dataset it noisy, throwing away some of the low PCA components might get us better results. Why is this the case?