Overall, the works in the paper identify key principles underlying NNs. They are the initial skeleton of a fully-fledged theory of NNs. The main efforts in the years to come are to fill in the flesh and blood of the skeleton.
The theory gives S-System, **a measure-theoretical definition of NNs**; endows a stochastic manifold structure on the intermediate feature space of NNs through information geometry; proposes a learning framework that unifies both supervised learning and unsupervised learning in the same objective function; and proves **under practical conditions**, for **large size nonlinear deep NNs** with a class of losses, including the hinge loss, **all local minima are global minima** with zero loss errors. It also completes, or more precisely, clarify the analogy between NNs and Renormalization Group.