{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p><img alt=\"SOS logo\" height=\"45px\" src=\"https://indico.in2p3.fr/event/37891/logo-2009395760.png\" align=\"left\" hspace=\"10px\" vspace=\"0px\"></p> <h1>SOS 2026</h1>\n",
    "\n",
    "\n",
    "<h1>Hands-on: deep learning - advanced</h1>\n",
    "\n",
    "Author Florian Ruppin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Goal of the session**\n",
    "\n",
    "This notebook is a follow-up for students who are already comfortable with the content of the introductory course and the first hands-on notebook (multilayer perceptron coded from scratch + a first TensorFlow/Keras model).\n",
    "\n",
    "Here you will explore, **using Keras**, four tools that are routinely used to build and improve modern neural networks:\n",
    "\n",
    "1. **Regularization** - fighting overfitting (dropout, weight decay, batch normalization)\n",
    "2. **Skip connections** - the Keras *functional API* and residual blocks\n",
    "3. **Convolutional neural networks (CNN)** - the standard architecture for images\n",
    "4. **Autoencoders** - *bonus*, encoding/decoding images (image deblurring)\n",
    "\n",
    "Each section contains an **Example** cell (already written - read it and run it) and a **Your turn** cell (a scaffold for you to complete).\n",
    "\n",
    "> On Google Colab, switch to a GPU runtime for faster training: **Runtime --> Change runtime type --> Hardware accelerator --> GPU**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 0. Setup\n",
    "\n",
    "The cell below imports everything we need, loads the MNIST database once, and prepares two versions of the data:\n",
    "\n",
    "- a **flat** version (vectors of 784 values) for the dense networks of sections 1&ndash;2;\n",
    "- an **image** version (28&times;28&times;1) for the CNN and the autoencoder.\n",
    "\n",
    "It also builds a small training subset so that the overfitting demonstrations train in a few seconds, and defines a helper function `plot_history` to visualize the training curves."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "ImportError",
     "evalue": "Error importing numpy: you should not try to import numpy from\n        its source directory; please exit the numpy source tree, and relaunch\n        your python interpreter from there.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/__init__.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     23\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 24\u001b[0;31m     \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mmultiarray\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     25\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mImportError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/multiarray.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mfunctools\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0moverrides\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     11\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0m_multiarray_umath\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/overrides.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      7\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_utils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_inspect\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mgetargspec\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m from numpy.core._multiarray_umath import (\n\u001b[0m\u001b[1;32m      9\u001b[0m     add_docstring,  _get_implementing_args, _ArrayFunctionDispatcher)\n",
      "\u001b[0;31mImportError\u001b[0m: dlopen(/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libgfortran.5.dylib\n  Referenced from: <C435405B-BFF1-399F-A1EC-BD43418F546A> /opt/anaconda3/envs/m2cosmo/lib/libopenblas.0.dylib\n  Reason: tried: '/opt/anaconda3/envs/m2cosmo/lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/../../../../libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/../../../../libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/bin/../lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/bin/../lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/usr/local/lib/libgfortran.5.dylib' (no such file), '/usr/lib/libgfortran.5.dylib' (no such file, not in dyld cache)",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/__init__.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m    129\u001b[0m     \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 130\u001b[0;31m         \u001b[0;32mfrom\u001b[0m \u001b[0mnumpy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__config__\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mshow\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mshow_config\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    131\u001b[0m     \u001b[0;32mexcept\u001b[0m \u001b[0mImportError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/__config__.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0menum\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mEnum\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m from numpy.core._multiarray_umath import (\n\u001b[0m\u001b[1;32m      5\u001b[0m     \u001b[0m__cpu_features__\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/__init__.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     49\u001b[0m         __version__, exc)\n\u001b[0;32m---> 50\u001b[0;31m     \u001b[0;32mraise\u001b[0m \u001b[0mImportError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     51\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mImportError\u001b[0m: \n\nIMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!\n\nImporting the numpy C-extensions failed. This error can happen for\nmany reasons, often due to issues with your setup or how NumPy was\ninstalled.\n\nWe have compiled some common reasons and troubleshooting tips at:\n\n    https://numpy.org/devdocs/user/troubleshooting-importerror.html\n\nPlease note and check the following:\n\n  * The Python version is: Python3.9 from \"/opt/anaconda3/envs/m2cosmo/bin/python\"\n  * The NumPy version is: \"1.26.4\"\n\nand make sure that they are the versions you expect.\nPlease carefully study the documentation linked above for further help.\n\nOriginal error was: dlopen(/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libgfortran.5.dylib\n  Referenced from: <C435405B-BFF1-399F-A1EC-BD43418F546A> /opt/anaconda3/envs/m2cosmo/lib/libopenblas.0.dylib\n  Reason: tried: '/opt/anaconda3/envs/m2cosmo/lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/../../../../libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/core/../../../../libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/bin/../lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/opt/anaconda3/envs/m2cosmo/bin/../lib/libgfortran.5.dylib' (duplicate LC_RPATH '@loader_path'), '/usr/local/lib/libgfortran.5.dylib' (no such file), '/usr/lib/libgfortran.5.dylib' (no such file, not in dyld cache)\n",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[0;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[0;32m/var/folders/q2/p08nh3wd1wx4dgdfrsjq8h_c0000gn/T/ipykernel_39713/1356646138.py\u001b[0m in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpyplot\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtensorflow\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mkeras\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mkeras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatasets\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mmnist\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mkeras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodels\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mSequential\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mModel\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/envs/m2cosmo/lib/python3.9/site-packages/numpy/__init__.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m    133\u001b[0m         \u001b[0mits\u001b[0m \u001b[0msource\u001b[0m \u001b[0mdirectory\u001b[0m\u001b[0;34m;\u001b[0m \u001b[0mplease\u001b[0m \u001b[0mexit\u001b[0m \u001b[0mthe\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0msource\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mrelaunch\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    134\u001b[0m         your python interpreter from there.\"\"\"\n\u001b[0;32m--> 135\u001b[0;31m         \u001b[0;32mraise\u001b[0m \u001b[0mImportError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    136\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    137\u001b[0m     __all__ = [\n",
      "\u001b[0;31mImportError\u001b[0m: Error importing numpy: you should not try to import numpy from\n        its source directory; please exit the numpy source tree, and relaunch\n        your python interpreter from there."
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from tensorflow import keras\n",
    "from keras.datasets import mnist\n",
    "from keras.models import Sequential, Model\n",
    "from keras.layers import (Input, Dense, Dropout, BatchNormalization,\n",
    "                          Activation, Add, Conv2D, MaxPooling2D,\n",
    "                          Flatten, Conv2DTranspose)\n",
    "from keras.utils import to_categorical\n",
    "from keras import regularizers\n",
    "\n",
    "# ---- Load MNIST once ----\n",
    "(x_train_img, y_train), (x_test_img, y_test) = mnist.load_data()\n",
    "\n",
    "# Normalize to [0, 1]\n",
    "x_train_img = x_train_img.astype('float32') / 255.0\n",
    "x_test_img  = x_test_img.astype('float32') / 255.0\n",
    "\n",
    "# Flat version (for dense networks): shape (n, 784)\n",
    "x_train = x_train_img.reshape(x_train_img.shape[0], -1)\n",
    "x_test  = x_test_img.reshape(x_test_img.shape[0], -1)\n",
    "\n",
    "# Image version with explicit channel (for CNN / autoencoder): shape (n, 28, 28, 1)\n",
    "x_train_cnn = np.expand_dims(x_train_img, -1)\n",
    "x_test_cnn  = np.expand_dims(x_test_img, -1)\n",
    "\n",
    "# One-hot encoded labels\n",
    "y_train_oh = to_categorical(y_train, 10)\n",
    "y_test_oh  = to_categorical(y_test, 10)\n",
    "\n",
    "# Small subset to make overfitting clearly visible and training fast\n",
    "N_small = 2000\n",
    "x_small, y_small = x_train[:N_small], y_train_oh[:N_small]\n",
    "\n",
    "print('Flat shapes :', x_train.shape, x_test.shape)\n",
    "print('Image shapes:', x_train_cnn.shape, x_test_cnn.shape)\n",
    "print('Small subset:', x_small.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_history(history, title=''):\n",
    "    \"\"\"Plot training/validation loss and accuracy from a Keras History object.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    history : keras.callbacks.History\n",
    "        The object returned by model.fit(...).\n",
    "    title : str\n",
    "        Optional title prefix for the figure.\n",
    "    \"\"\"\n",
    "    h = history.history\n",
    "    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4))\n",
    "    ax1.plot(h['loss'], label='training')\n",
    "    ax1.plot(h['val_loss'], label='validation')\n",
    "    ax1.set_xlabel('epoch'); ax1.set_ylabel('loss'); ax1.legend()\n",
    "    ax1.set_title(f'{title} loss')\n",
    "    ax2.plot(h['accuracy'], label='training')\n",
    "    ax2.plot(h['val_accuracy'], label='validation')\n",
    "    ax2.set_xlabel('epoch'); ax2.set_ylabel('accuracy'); ax2.legend()\n",
    "    ax2.set_title(f'{title} accuracy')\n",
    "    plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 1. A baseline that overfits\n",
    "\n",
    "Before fixing overfitting, let's *produce* it. We train a deliberately oversized dense network on the small 2000-image subset for many epochs. Because the model has far more free parameters than it needs and sees very little data, it will memorize the training set: the **training** accuracy will keep climbing while the **validation** loss eventually *rises* again &mdash; the signature of overfitting.\n",
    "\n",
    "**Example (provided &mdash; just run it).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_baseline():\n",
    "    \"\"\"A plain (unregularized) dense classifier for flat MNIST.\"\"\"\n",
    "    model = Sequential([\n",
    "        Input(shape=(784,)),\n",
    "        Dense(256, activation='relu'),\n",
    "        Dense(256, activation='relu'),\n",
    "        Dense(10,  activation='softmax'),\n",
    "    ])\n",
    "    model.compile(optimizer=keras.optimizers.Adam(1e-3),\n",
    "                  loss='categorical_crossentropy', metrics=['accuracy'])\n",
    "    return model\n",
    "\n",
    "baseline = build_baseline()\n",
    "hist_baseline = baseline.fit(x_small, y_small,\n",
    "                             validation_data=(x_test, y_test_oh),\n",
    "                             batch_size=64, epochs=60, verbose=0)\n",
    "plot_history(hist_baseline, title='Baseline (no regularization):')\n",
    "print('Final validation accuracy:', hist_baseline.history['val_accuracy'][-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**What to observe.** The training loss keeps decreasing toward 0 while the validation loss flattens and then turns upward. The gap between the training and validation curves is the overfitting we now want to reduce."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 2. Regularization\n",
    "\n",
    "Three widely used tools reduce overfitting and help the network generalize so that its performance on the test sample matches the training sample:\n",
    "\n",
    "- **Dropout** &mdash; randomly switches off a fraction of neurons at each training step ([explanation](https://towardsdatascience.com/dropout-in-neural-networks-47a162d621d9))\n",
    "- **L1 / L2 regularization** (a.k.a. *weight decay*) &mdash; penalizes large weights ([explanation](https://medium.com/@alejandro.itoaramendia/l1-and-l2-regularization-part-1-a-complete-guide-51cf45bb4ade))\n",
    "- **Batch normalization** &mdash; normalizes the activations inside the network ([explanation](https://medium.com/@ghoshanurag66/batch-normalization-math-and-implementation-fe06293f7443))\n",
    "\n",
    "### 2.1 Example: dropout\n",
    "\n",
    "**Example (provided &mdash; just run it).** Same architecture as the baseline, but with a `Dropout` layer after each hidden layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_dropout():\n",
    "    \"\"\"Dense classifier with dropout after each hidden layer.\"\"\"\n",
    "    model = Sequential([\n",
    "        Input(shape=(784,)),\n",
    "        Dense(256, activation='relu'),\n",
    "        Dropout(0.4),\n",
    "        Dense(256, activation='relu'),\n",
    "        Dropout(0.4),\n",
    "        Dense(10,  activation='softmax'),\n",
    "    ])\n",
    "    model.compile(optimizer=keras.optimizers.Adam(1e-3),\n",
    "                  loss='categorical_crossentropy', metrics=['accuracy'])\n",
    "    return model\n",
    "\n",
    "dropout_model = build_dropout()\n",
    "hist_dropout = dropout_model.fit(x_small, y_small,\n",
    "                                 validation_data=(x_test, y_test_oh),\n",
    "                                 batch_size=64, epochs=60, verbose=0)\n",
    "plot_history(hist_dropout, title='With dropout:')\n",
    "print('Final validation accuracy:', hist_dropout.history['val_accuracy'][-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**What to observe.** The training and validation curves stay much closer together, and the validation loss no longer turns sharply upward.\n",
    "\n",
    "### 2.2 Your turn: weight decay + batch normalization\n",
    "\n",
    "**Your turn.** Complete `build_regularized()` below so that it combines **two** new ingredients on top of the baseline:\n",
    "\n",
    "- **L2 weight decay** on each hidden `Dense` layer, via the keyword `kernel_regularizer=regularizers.l2(1e-4)`;\n",
    "- a **`BatchNormalization()`** layer inserted *between* the dense layer and its ReLU activation (so use a separate `Activation('relu')` layer instead of `activation='relu'`).\n",
    "\n",
    "Then train it on the same subset and compare the curves with the baseline and the dropout model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_regularized():\n",
    "    \"\"\"Dense classifier with L2 weight decay + batch normalization.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    keras.Model\n",
    "        A compiled model ready to be trained.\n",
    "    \"\"\"\n",
    "    model = Sequential()\n",
    "    model.add(Input(shape=(784,)))\n",
    "    # ===================== TODO =====================\n",
    "    # For each of the TWO hidden layers (256 units), add in this order:\n",
    "    #   1) Dense(256, kernel_regularizer=regularizers.l2(1e-4))   # no activation here\n",
    "    #   2) BatchNormalization()\n",
    "    #   3) Activation('relu')\n",
    "    # Then add the output layer: Dense(10, activation='softmax').\n",
    "    raise NotImplementedError\n",
    "    # ================================================\n",
    "    model.compile(optimizer=keras.optimizers.Adam(1e-3),\n",
    "                  loss='categorical_crossentropy', metrics=['accuracy'])\n",
    "    return model\n",
    "\n",
    "reg_model = build_regularized()\n",
    "hist_reg = reg_model.fit(x_small, y_small,\n",
    "                         validation_data=(x_test, y_test_oh),\n",
    "                         batch_size=64, epochs=60, verbose=0)\n",
    "plot_history(hist_reg, title='L2 + batch norm:')\n",
    "print('Final validation accuracy:', hist_reg.history['val_accuracy'][-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 3. Skip connections (Keras functional API)\n",
    "\n",
    "**Skip connections** (or *residual* connections) add the input of a block directly to its output: `output = activation(x + block(x))`. They were introduced to mitigate the *vanishing gradient* problem and allow much deeper networks to train.\n",
    "\n",
    "- Theory: [skip connections](https://theaisummer.com/skip-connections/)\n",
    "- Practice: [Keras functional API guide](https://keras.io/guides/functional_api/)\n",
    "\n",
    "Skip connections cannot be expressed with the simple `Sequential` API because the data flow is no longer a straight line. We use the **functional API**, where each layer is *called* on a tensor and we explicitly wire the graph, then wrap it with `Model(inputs, outputs)`.\n",
    "\n",
    "### 3.1 Example: one residual block\n",
    "\n",
    "**Example (provided &mdash; just run it).** Note how `Add()([x, h])` merges the skip path `x` with the block output `h` (both must have the same shape: here 128 units)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def residual_block(x, units):\n",
    "    \"\"\"A simple residual block: two Dense layers + a skip connection.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    x : tensor\n",
    "        Input tensor of shape (..., units).\n",
    "    units : int\n",
    "        Width of the block (must match the last dimension of x).\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    tensor\n",
    "        Output tensor of shape (..., units).\n",
    "    \"\"\"\n",
    "    h = Dense(units, activation='relu')(x)\n",
    "    h = Dense(units)(h)            # no activation before the merge\n",
    "    out = Add()([x, h])            # skip connection: x + block(x)\n",
    "    out = Activation('relu')(out)\n",
    "    return out\n",
    "\n",
    "inputs = Input(shape=(784,))\n",
    "x = Dense(128, activation='relu')(inputs)   # project to 128 units\n",
    "x = residual_block(x, 128)                  # one residual block\n",
    "outputs = Dense(10, activation='softmax')(x)\n",
    "\n",
    "resnet1 = Model(inputs, outputs)\n",
    "resnet1.compile(optimizer=keras.optimizers.Adam(1e-3),\n",
    "                loss='categorical_crossentropy', metrics=['accuracy'])\n",
    "resnet1.summary()\n",
    "\n",
    "hist_res1 = resnet1.fit(x_small, y_small,\n",
    "                        validation_data=(x_test, y_test_oh),\n",
    "                        batch_size=64, epochs=30, verbose=0)\n",
    "plot_history(hist_res1, title='1 residual block:')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Your turn: stack a second residual block\n",
    "\n",
    "**Your turn.** Using the `residual_block` function above, build a deeper network with **two** residual blocks stacked one after the other (still 128 units each), then train it.\n",
    "\n",
    "- Start from `Input(shape=(784,))`.\n",
    "- Project to 128 units with a `Dense(128, activation='relu')` layer.\n",
    "- Apply `residual_block(..., 128)` **twice**.\n",
    "- Finish with `Dense(10, activation='softmax')` and wrap everything in a `Model`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ===================== TODO =====================\n",
    "# Build a functional model 'resnet2' with TWO residual blocks, then compile it\n",
    "# with Adam(1e-3) and 'categorical_crossentropy' (metrics=['accuracy']).\n",
    "#\n",
    "#   inputs  = Input(shape=(784,))\n",
    "#   x       = Dense(128, activation='relu')(inputs)\n",
    "#   x       = residual_block(x, 128)\n",
    "#   x       = residual_block(x, 128)\n",
    "#   outputs = Dense(10, activation='softmax')(x)\n",
    "#   resnet2 = Model(inputs, outputs)\n",
    "#   resnet2.compile(...)\n",
    "raise NotImplementedError\n",
    "# ================================================\n",
    "\n",
    "resnet2.summary()\n",
    "hist_res2 = resnet2.fit(x_small, y_small,\n",
    "                        validation_data=(x_test, y_test_oh),\n",
    "                        batch_size=64, epochs=30, verbose=0)\n",
    "plot_history(hist_res2, title='2 residual blocks:')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 4. Convolutional neural network (CNN)\n",
    "\n",
    "CNNs are the standard architecture for image recognition. Instead of flattening the image, they slide small learnable **filters** (kernels) across it to extract local features, preserving the 2-D spatial structure.\n",
    "\n",
    "<p style=\"text-align:center\">\n",
    "    <img alt=\"ConvNet\" width=\"750px\" src=\"https://editor.analyticsvidhya.com/uploads/94787Convolutional-Neural-Network.jpeg\" hspace=\"10px\" vspace=\"0px\">\n",
    "</p>\n",
    "\n",
    "A convolution layer runs a kernel (an N&times;N matrix of weights, *learned* during training) over the image to produce a new feature map that is passed to the next layer.\n",
    "\n",
    "### 4.1 Example: a small CNN on MNIST\n",
    "\n",
    "**Example (provided &mdash; just run it).** We train on a subset for 3 epochs to keep it fast; even so, the test accuracy is typically higher than the dense network from the first notebook. Use a GPU runtime if it feels slow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use a subset so the CNN trains quickly during the session\n",
    "N_cnn = 12000\n",
    "xc_train, yc_train = x_train_cnn[:N_cnn], y_train_oh[:N_cnn]\n",
    "\n",
    "cnn = Sequential([\n",
    "    Input(shape=(28, 28, 1)),\n",
    "    Conv2D(32, kernel_size=(3, 3), activation='relu'),\n",
    "    MaxPooling2D(pool_size=(2, 2)),\n",
    "    Conv2D(64, kernel_size=(3, 3), activation='relu'),\n",
    "    MaxPooling2D(pool_size=(2, 2)),\n",
    "    Flatten(),\n",
    "    Dropout(0.5),\n",
    "    Dense(10, activation='softmax'),\n",
    "])\n",
    "cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])\n",
    "cnn.summary()\n",
    "\n",
    "cnn.fit(xc_train, yc_train, batch_size=128, epochs=3, validation_split=0.1)\n",
    "score = cnn.evaluate(x_test_cnn, y_test_oh, verbose=0)\n",
    "print(f'Test loss: {score[0]:.4f}  |  test accuracy: {score[1]:.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 Your turn: deepen the CNN\n",
    "\n",
    "**Your turn.** Build a second CNN (`cnn2`) with a bit more capacity and compare its test accuracy with the one above. For example:\n",
    "\n",
    "- add a **third** `Conv2D` block (e.g. 128 filters) &mdash; *or* increase the number of filters in the existing layers;\n",
    "- optionally insert a `Dense(64, activation='relu')` layer before the output.\n",
    "\n",
    "Keep the same input shape `(28, 28, 1)` and the softmax output of 10 units. Watch out: every `MaxPooling2D` halves the spatial size, so you can only pool while the feature map stays larger than the kernel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ===================== TODO =====================\n",
    "# Build, compile, and train a deeper CNN called 'cnn2', then evaluate it on the test set.\n",
    "# Reuse xc_train / yc_train and x_test_cnn / y_test_oh.\n",
    "raise NotImplementedError\n",
    "# ================================================\n",
    "\n",
    "score2 = cnn2.evaluate(x_test_cnn, y_test_oh, verbose=0)\n",
    "print(f'cnn2 test accuracy: {score2[1]:.4f}  (baseline CNN: {score[1]:.4f})')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 5. Bonus &mdash; Autoencoder for image deblurring\n",
    "\n",
    "> *This section is a bonus and is **not** part of the 30-minute budget &mdash; tackle it if you finish early.*\n",
    "\n",
    "An **autoencoder** compresses (encodes) an input into a lower-dimensional representation and then reconstructs (decodes) a target image from it. A classic use case is *denoising*; here we will instead **deblur** images.\n",
    "\n",
    "<p style=\"text-align:center\">\n",
    "    <img alt=\"Autoencoder\" width=\"750px\" src=\"https://www.researchgate.net/profile/Xifeng-Guo/publication/320658590/figure/fig1/AS:614154637418504@1523437284408/The-structure-of-proposed-Convolutional-AutoEncoders-CAE-for-MNIST-In-the-middle-there.png\" hspace=\"10px\" vspace=\"0px\">\n",
    "</p>\n",
    "\n",
    "We follow the structure of the official [Keras denoising autoencoder example](https://keras.io/examples/vision/autoencoder/), but replace the *noise* corruption with a *blur* corruption: each (N&times;N) block of pixels is replaced by its mean value (pixelation).\n",
    "\n",
    "### 5.1 Example: the blur corruption\n",
    "\n",
    "**Example (provided &mdash; just run it).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_blur(images, block=4):\n",
    "    \"\"\"Block-average blur: replace each (block x block) patch by its mean value.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    images : ndarray, shape (n, H, W, 1)\n",
    "        Input images with H and W divisible by `block` (28 is divisible by 4).\n",
    "    block : int\n",
    "        Side length of the averaging block.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    ndarray, same shape as `images`\n",
    "        The blurred images.\n",
    "    \"\"\"\n",
    "    imgs = np.asarray(images, dtype='float32')\n",
    "    n, H, W, C = imgs.shape\n",
    "    r = imgs.reshape(n, H // block, block, W // block, block, C)\n",
    "    m = r.mean(axis=(2, 4), keepdims=True)\n",
    "    return np.broadcast_to(m, r.shape).reshape(n, H, W, C).copy()\n",
    "\n",
    "x_train_blur = add_blur(x_train_cnn, block=4)\n",
    "x_test_blur  = add_blur(x_test_cnn,  block=4)\n",
    "\n",
    "# Visual check: top row = blurred (input), bottom row = sharp (target)\n",
    "fig, axes = plt.subplots(2, 6, figsize=(10, 3.5))\n",
    "for k in range(6):\n",
    "    axes[0, k].imshow(x_test_blur[k, :, :, 0], cmap='gray'); axes[0, k].axis('off')\n",
    "    axes[1, k].imshow(x_test_cnn[k, :, :, 0],  cmap='gray'); axes[1, k].axis('off')\n",
    "axes[0, 0].set_title('blurred', loc='left')\n",
    "axes[1, 0].set_title('sharp',   loc='left')\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2 Your turn: build and train the deblurring autoencoder\n",
    "\n",
    "**Your turn.** Build a small convolutional autoencoder that maps a **blurred** image to its **sharp** version.\n",
    "\n",
    "- Input shape `(28, 28, 1)`.\n",
    "- **Encoder**: a couple of `Conv2D(..., activation='relu', padding='same')` layers, each followed by `MaxPooling2D((2, 2), padding='same')` to downsample.\n",
    "- **Decoder**: `Conv2DTranspose(..., strides=2, activation='relu', padding='same')` layers to upsample back to 28&times;28, then a final `Conv2D(1, (3, 3), activation='sigmoid', padding='same')` so the output pixels lie in [0, 1].\n",
    "- Compile with `optimizer='adam'` and `loss='binary_crossentropy'` (or `'mse'`).\n",
    "- Train with **input** `x_train_blur` and **target** `x_train_cnn` (the sharp images), using `validation_data=(x_test_blur, x_test_cnn)`. A few epochs are enough to see an effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ===================== TODO =====================\n",
    "# 1) Build a convolutional autoencoder 'autoencoder' (encoder + decoder as described above).\n",
    "# 2) Compile it (optimizer='adam', loss='binary_crossentropy').\n",
    "# 3) Train it: input = x_train_blur, target = x_train_cnn (the sharp images).\n",
    "#    Tip: a subset (e.g. the first 12000 images) and ~5 epochs keep it fast.\n",
    "raise NotImplementedError\n",
    "# ================================================"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Your turn.** Once trained, run the autoencoder on a few blurred test images and display the three rows side by side: **blurred input**, **autoencoder output**, **sharp target**. Reuse the plotting pattern from the Example cell in 5.1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ===================== TODO =====================\n",
    "# Predict on x_test_blur[:6], then plot 3 rows:\n",
    "#   row 0: x_test_blur[k]      (input)\n",
    "#   row 1: decoded[k]          (autoencoder output)\n",
    "#   row 2: x_test_cnn[k]       (target)\n",
    "raise NotImplementedError\n",
    "# ================================================"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "m2cosmo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}