______________ means the gradient of deep learning models du…
______________ means the gradient of deep learning models during training is too small or too large (unstable) ; therefore, the deep learning model cannot learn more effectively. To mitigate this problem, we can use the ReLU function as an activation function, use gradient clipping to normalize it when the norm of the gradient is larger than the threshold, and add gate structures in the deep learning model to consider long-term dependency in the dataset.