Optimal a priori estimates are derived for the population risk of a regularized residual network model. The key lies in the designing of a new path norm, called the weighted path norm, which serves as the regularization term in the regularized model. The weighted path norm treats the skip connections and the nonlinearities differently so that paths with more nonlinearities have larger weights. The error estimates are a priori in nature in the sense that the estimates depend only on the target function and not on the parameters obtained in the training process. The estimates are optimal in the sense that the bound scales as O(1/L) with the network depth and the estimation error is comparable to the Monte Carlo error rates. In particular, optimal error bounds are obtained, for the first time, in terms of the depth of the network model. Comparisons are made with existing norm-based generalization error bounds.
Click To Get Code For Paper
Click For Paper Source