Bonus Question (5 points) The above figure is for the gradi…
Bonus Question (5 points) The above figure is for the gradient boosting algorithm for regression. Step 1. A new decision tree (DT) is trained with feature X and label r (i.e., residual) to predict the residual. Step 2. The predicted residual in Step 1 is multiplied by the learning rate and is added to the prior predicted The learning rate is between 0 and 1 for slow learning to avoid overfitting. Step 3. The residual is updated by subtracting the new DT in Step 1 multiplied by the learning rate. Step 4. The final predicted Y in the gradient boosting is the additive function of DTs multiplied by the learning rate in each stage. Overall, gradient boosting is a (1) _____________ (a. parallel learning, b. sequential learning; 1 point). In addition, a new decision tree in each stage is created based on the information from the prior trees to improve performance. Based on the algorithm, which one is not a hyperparameter for gradient boosting? (2)_________ (2 points) the number of trees the maximum depth of each tree learning rate dropout rate the number of splits in each tree