Skip to main navigationSkip to main contentSkip to footer
Wiki Cram
  • Home
  • Blog
Wiki Cram

Question 7: (14 points) Consider the best practices for trai…

Question 7: (14 points) Consider the best practices for training neural networks. Answer the following questions: (4 points) In practice, the Nesterov momentum often converges faster than the standard momentum. Explain why the correction based on the gradient at the anticipated position might prevent overshooting. (6 points) You observe the following training behaviors: Observation 1: Your deep network (8 layers) trains very slowly. The gradients in early layers are extremely small. Training loss decreases but very gradually. Observation 2: Your network achieves 99\% training accuracy but only 70\% validation accuracy. The gap is large and consistent. Observation 3: Your network’s training is unstable – loss fluctuates wildly and sometimes diverges. Different random initializations lead to very different outcomes. For each observation: (i) Identify whether dropout, batch normalization, or both would help, and (ii) Explain why the chosen technique addresses the specific problem.   3. (4 points) Consider a feedforward neural network with the following architecture: Input (100 features) → Dense(256) → ReLU → Dense(128) → ReLU → Dense(10) → Softmax. Calculate the total number of trainable parameters in this network. Show your work by computing the parameters for each layer separately, including both weights and biases.

Question 7: (14 points) Consider the best practices for trai…

Posted on: December 10, 2025 Last updated on: December 10, 2025 Written by: Anonymous Categorized in: Uncategorized
Skip back to main navigation
Powered by Studyeffect

Post navigation

Previous Post Find the quotient and remainder using division for Quotient…
Next Post Jerry is driving to visit his parents, who live two states a…
  • Privacy Policy
  • Terms of Service
Copyright © 2025 WIKI CRAM — Powered by NanoSpace