Neural networks are often the state of art for a lot of tasks, however, they are still not well understood from a statistical point of view. In this talk, we consider regression models involving multilayer perceptrons (MLP). It is a difficult task to study statistical properties of such model because these models may be heavily over parameterized. We assume that we observe a random sample of identically distributed independent variables from a distribution P of a vector (X,Y), with Y a real random variable and X a real random vector. The variable Y is a function of X with the addition of an noise, we call it a regression model. The regression function depends on a parameter vector, often called the weights of the MLP. A natural estimator of the best weights for our model is the least square estimator (LSE) that minimizes the sum of square errors (SSE). We will study the asymptotic behavior of the difference of SSE between the estimated model and the best one. Firstly, in the case of MLP with only one hidden layer, then with MLP functions with arbitrarly large number of hidden layers and ReLu activation functions which is the most used model for deep learning.