{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EWS53qjRIYdq"
   },
   "source": [
    "<p><img alt=\"SOS logo\" height=\"45px\" src=\"https://indico.in2p3.fr/event/37891/logo-2009395760.png\" align=\"left\" hspace=\"10px\" vspace=\"0px\"></p> <h1>SOS 2026</h1>\n",
    "\n",
    "\n",
    "<h1>Hands-on: deep learning</h1>\n",
    "\n",
    "Author Florian Ruppin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GJBs_flRovLc"
   },
   "source": [
    "## **Goal of the session**\n",
    "\n",
    "The aim of this tutorial is to implement a multilayer perceptron neural network for the classification of a large number of images. You will first only use the usual libraries such as numpy and matplotlib to develop all the functions needed to train and use a neural network. You will then use the keras library integrated into the TensorFlow machine learning tool developed by Google. This library will enable you to easily build a neural network and study the impact of its architecture on final performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Coding a neural network from A to Z\n",
    "\n",
    "First of all, you're going to develop your own neural network with the aim of classifying images from the MNIST (Mixed National Institute of Standards and Technology) database. These images of $28 \\times 28$ pixels correspond to handwritten numbers from 0 to 9 (cf. left panel of Fig. 1). Your objective is to train a neural network to associate a number from 0 to 9 with the input image. The architecture of the neural network to be developed is very simple: \n",
    "- an input layer containing as many neurons as pixels present in the images to be processed\n",
    "- a hidden layer containing 10 neurons with a ReLU activation function\n",
    "- an output layer containing as many neurons as possible class labels, i.e. 10, with a softmax activation function to associate a probability with each class label\n",
    "\n",
    "The architecture of this neural network is shown in the right-hand panel of Fig.1. It contains two weight matrices $W^0$ and $W^1$ and two bias vectors $b^0$ and $b^1$.\n",
    "\n",
    "<p style=\"text-align:center\">\n",
    "    <img alt=\"Hands on DL\" width=\"800px\" src=\"https://perso.ip2i.in2p3.fr/ruppin/hands_on_dl.jpg\" hspace=\"10px\" vspace=\"0px\">\n",
    "</p>\n",
    "<p style=\"text-align:center\"><em>Figure 1:</em> <b>Left:</b> Examples of images in the MNIST database. <b>Right:</b> Schematic representation of the neural network to be implemented.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Import the data**\n",
    "\n",
    "The lines below allow you to import the data that will be used during this hands-on session.\n",
    "- How many images are there in the training and test samples?\n",
    "- How many neurons will there be in the input layer of our network?\n",
    "- Print the 10 first class labels of the training sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow import keras\n",
    "from keras.datasets import mnist\n",
    "\n",
    "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
    "\n",
    "# Normalize the data\n",
    "x_train, x_test = x_train / 255.0, x_test / 255.0 \n",
    "\n",
    "# Reshape training and test sample to match input layer\n",
    "x_train = x_train.reshape(x_train.shape[0], x_train.shape[1] * x_train.shape[2])\n",
    "x_test = x_test.reshape(x_test.shape[0], x_test.shape[1] * x_test.shape[2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the class labels are numbers $i$ between 0 and 9. However, we will need the class labels to take the form of vectors of size 10 with a 1 at the $i^{th}$ position and zeros elsewhere. For example, the number 7 becomes (0,0,0,0,0,0,0,1,0,0).\n",
    "- Complete the following function that will transform a single digit label into a vector label that can be compared to the output of our newtwork"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def one_hot(Yi):\n",
    "    \"\"\"transform image figure into one-hot vector\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    Yi: int\n",
    "        the figure value on the image\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array\n",
    "        a one-hot encoded vector with 1 at the location of the input figure\n",
    "    \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Code the network**\n",
    "\n",
    "The list of functions that we will need in order to train our neural network is the following:\n",
    "- init_params <em>-- initializes our weights and biases randomly</em>\n",
    "- ReLU\n",
    "- Softmax <em>-- the two activation functions that we will consider</em>\n",
    "- ReLU_deriv <em>-- derivative of the ReLU function</em>\n",
    "- forward_prop <em>-- forward propagation for a given image</em>\n",
    "- backward_prop <em>-- backward propagation for a given pair (image, label)</em>\n",
    "- update_params <em>-- update weight and bias values after data-set full pass</em>\n",
    "\n",
    "The init_params, ReLU, Softmax, and ReLU_deriv functions are quite easy to code. They are given in the cells bellow.\n",
    "- Complete the following cells based on the algorithm that we introduced this morning in order to define forward_prop, backward_prop, and update_params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "def init_params(N_hidden_neurons):\n",
    "    \"\"\"Weight and bias initialization\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    N_hidden_neurons: int\n",
    "        number of neurons in hidden layer\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array, array, array, array\n",
    "        the weight and bias matrices\n",
    "    \"\"\"\n",
    "\n",
    "    # inverse order of line/column for faster memory access\n",
    "    W0 = np.random.rand(784,N_hidden_neurons) - 0.5\n",
    "    b0 = np.random.rand(N_hidden_neurons) - 0.5\n",
    "    W1 = np.random.rand(N_hidden_neurons,10) - 0.5\n",
    "    b1 = np.random.rand(10) - 0.5\n",
    "    return W0, b0, W1, b1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "def ReLU(Z):\n",
    "    \"\"\"ReLU function for the hidden layer\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    Z: array\n",
    "        linear combination from previous layer + bias\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array\n",
    "        output from hidden neuron\n",
    "    \"\"\"\n",
    "    return np.maximum(Z, 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "def softmax(Z):\n",
    "    \"\"\"softmax function for the output layer\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    Z: array\n",
    "        linear combination from previous layer + bias\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array\n",
    "        output from output neuron\n",
    "    \"\"\"\n",
    "    A = np.exp(Z) / sum(np.exp(Z))\n",
    "    return A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "def ReLU_deriv(Z):\n",
    "    \"\"\"derivative of ReLU function\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    Z: array\n",
    "        linear combination from previous layer + bias\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array\n",
    "        array of boolean (interpreted as 0 in 1 in operations)\n",
    "    \"\"\"\n",
    "    return Z > 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def forward_prop(W0, b0, W1, b1, Xi):\n",
    "    \"\"\"forward propagation for a given image\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    W0, b0, W1, b1: array\n",
    "        the weight and bias matrices of the network\n",
    "    Xi: array\n",
    "        an image\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array, array, array, array\n",
    "        the entry and output for each activation function\n",
    "    \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def backward_prop(Z0, A0, A1, W1, Xi, Yi):\n",
    "    \"\"\"backward propagation for a given (image, class)\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    Z0: array\n",
    "        the entry of hidden activation function\n",
    "    A0, A1: array\n",
    "        the output of all activation functions\n",
    "    Xi, Yi: array\n",
    "        an image and its associated class\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array, array, array, array\n",
    "        derivatives of cross-entropy loss wrt weights and biases\n",
    "    \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def update_params(W0, b0, W1, b1, dJdW0_arr, dJdb0_arr, dJdW1_arr, dJdb1_arr, alpha):\n",
    "    \"\"\"update weight and bias values after data-set full pass\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    W0, b0, W1, b1: array\n",
    "        previous weight and bias matrices of the network\n",
    "    dJdW0_arr, dJdb0_arr, dJdW1_arr, dJdb1_arr: list\n",
    "        lists of derivatives of cross-entropy loss wrt weights and biases\n",
    "        for each image in the training sample\n",
    "    alpha: float\n",
    "        network learning rate\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array, array, array, array\n",
    "        updated weight and bias matrices of the network\n",
    "    \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Prediction and accuracy**\n",
    "\n",
    "You will need a function that returns the predicted class label for a given image as well as a function that returns the accuracy of the network in order to record the evolution of the network performance during training. These functions are quite straightforward to code and are given below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_predictions(A1):\n",
    "    \"\"\"get predicted value for a given image\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    A1: array\n",
    "        the output for each neuron in output layer\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    int\n",
    "        the predicted class of the image\n",
    "    \"\"\"\n",
    "    return np.argmax(A1)\n",
    "\n",
    "\n",
    "def get_accuracy(predictions, Y):\n",
    "    \"\"\"get the accuracy of the network\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    predictions: list\n",
    "        the predicted classes for a list of images\n",
    "    Y: list\n",
    "        the actual classes for the same images\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    float\n",
    "        the accuracy of the network for the considered set of images\n",
    "    \"\"\"\n",
    "    return np.sum(predictions == Y) / Y.size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Gradient descent**\n",
    "\n",
    "You now have all the tools needed to code your own gradient descent function that will train this neural network to recognize the MNIST images.\n",
    "- Complete the following cell so that the function returns the weight matrices and bias vectors of the trained network. You will use the get_predictions and get_accuracy functions in order to print the accuracy of the network on the training sample every 10 epochs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gradient_descent(\n",
    "    X, Y, alpha, N_hidden_neurons, iterations\n",
    "):\n",
    "    \"\"\"train the neural network\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    X, Y: array, array\n",
    "        the images and corresponding classes for the\n",
    "        training sample\n",
    "    alpha: float\n",
    "        the learning rate\n",
    "    N_hidden_neurons: int\n",
    "        number of neurons in hidden layer\n",
    "    iterations: int\n",
    "        number of iterations for the wight and bias updates\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    array, array, array, array\n",
    "        weight and bias matrices of the trained network\n",
    "    \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Test your gradient_descent function using a learning rate of 1 and 100 epochs. What is the final accuracy of your network?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Check that your network works by running the following cell multiple times (uncomment the last line of the cell to get an output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def make_prediction(X, W0, b0, W1, b1):\n",
    "    \"\"\"predict the class of a given image\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    X: array\n",
    "        an test image\n",
    "    W0, b0, W1, b1: array\n",
    "        weight and bias matrices of the trained network\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    int\n",
    "        predicted class of the image\n",
    "    \"\"\"\n",
    "    _, _, _, A1 = forward_prop(W0, b0, W1, b1, X)\n",
    "    prediction = get_predictions(A1)\n",
    "    return prediction\n",
    "\n",
    "def test_prediction(x, W0, b0, W1, b1):\n",
    "    \"\"\"show a test image and print its predicted and\n",
    "        its actual class\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    index: int\n",
    "        position in the test sample\n",
    "    W0, b0, W1, b1: array\n",
    "        weight and bias matrices of the trained network\n",
    "\n",
    "    \"\"\"\n",
    "    index = np.random.randint(0,x.shape[0])\n",
    "    current_image = x[index,:]\n",
    "    prediction = make_prediction(x[index,:], W0, b0, W1, b1)\n",
    "    label = y_test[index]\n",
    "    print(\"Prediction: \", prediction)\n",
    "    print(\"Actual class label: \", label)\n",
    "\n",
    "    current_image = current_image.reshape((28, 28)) * 255\n",
    "    plt.gray()\n",
    "    plt.imshow(current_image, interpolation=\"nearest\")\n",
    "    plt.show()\n",
    "\n",
    "#test_prediction(x_test, W0, b0, W1, b1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using TensorFlow\n",
    "\n",
    "You will now use the keras module of the tensorflow library in order to train the same neural network and compare its performance with the one you just coded.\n",
    "\n",
    "- Use the <em>to\\_categorical</em> function in <em>keras.utils</em> to transform the numbers contained in y_train and y_test (class labels associated with the images) into vectors of 10 values containing a 1 at the position corresponding to the number displayed in each image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Use the <em>Sequential</em> module in <em>keras.models</em> to initialize a model object corresponding to your neural network. Use the <em>add</em> method associated with your model object to add the hidden layer of 10 neurons with a ReLU activation function. You will define it using the <em>Dense</em> module of <em>keras.layers</em>. Finally, add the output layer of 10 neurons with a softmax activation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the architecture of your neural network is defined you will need to specify the minimizer and its learning rate as well as the loss function to consider during training.\n",
    "- Use the <em>Adam</em> function in <em>keras.optimizers</em> to initialize an object called opt corresponding to the method used to minimize the loss function. Consider a learning rate of $10^{-2}$.\n",
    "- Use the <em>compile</em> method associated with your model object to associate the opt object with your neural network. You'll use <em>categorical_crossentropy</em> as the loss function, and you'll also request access to the evolution of the success rate as a function of the iterations of the training phase by adding the keyword $\\mathrm{metrics=['accuracy']}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Use the <em>fit</em> method associated with your model object to train your neural network from the x_train and y_train arrays. Set the batch_size option to the number of images in x_train and the number of iterations (epochs) to 300. Also use the validation_data option to give the x_test and y_test arrays so that the success rate is calculated for the test sample at each epoch of training. Store the output of this training phase in a variable called out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Check the different keys associated with the out.history dictionnary\n",
    "- Plot the evolution of the loss function and success rate for the training and validation samples as a function of the number of epochs. What is the maximum success rate achieved with this neural network architecture? Does it appear to have converged? Increase the number of epochs in the training phase so that it converges to within $\\sim10^{-3}$. What maximum value do you obtain?\n",
    "- Perform the same training with a learning rate of $0.2$. What is your maximum success rate?\n",
    "- Change the number of neurons in the hidden layer, add an extra hidden layer, change the size of the batch size, and check the impact of these changes on the succes rate.\n",
    "- What do you notice about the loss function of the validation sample if you use a network with two hidden layers and a small batch size? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [
    {
     "file_id": "https://github.com/gmention-at-cea/sos2021/blob/main/Welcome_To_Colaboratory.ipynb",
     "timestamp": 1715332750197
    }
   ],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "sos2024DL",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}