Speaker
Description
This project proposes to study a small neural network trained on a simple supervised learning task from a perspective inspired by statistical mechanics. The central idea is to identify the active paths connecting the input and output layers, analyze how they contribute to the represented function before and after training, and investigate whether their collective behavior admits an effective macroscopic description. To do this, the network output will be decomposed into contributions associated with active paths, and several collective observables will be introduced, including the number of active paths, the distribution of their effective contributions, and their temporal evolution during learning. Using multiple random initializations and training trajectories under stochastic gradient descent, the project will examine whether the evolution of the learned output can be approximated by an effective stochastic differential equation of the form $df_t(x)=A_t(x),dt + B_t(x),dW_t$, where $A_t(x)$ represents an average learning drift and $B_t(x)$ a fluctuation term induced by the stochastic nature of training. The main goal is to explore whether active paths can be interpreted as mesoscopic variables linking the microscopic dynamics of the weights with the macroscopic evolution of the learned function, providing a more structured description of supervised learning.