AISSAI Anomaly Detection Workshop

Europe/Paris
Clermont-Ferrand

Clermont-Ferrand

Description

AISSAI Anomaly Detection Workshop   
4-7 March 2024


The first AISSAI Anomaly Detection Workshop will be held from the 4th to the 7th March 2024 in Clermont-Ferrand, France.
 

Anomaly detection is a challenging problem which can greatly benefit from the use of machine learning (ML) methods: unsupervised as well as semi-supervised. ML algorithms must be able to process complex, massive data sets and search for anomalies under extreme conditions (very low signal-to-noise ratio, real-time data, etc). The range of applications for anomaly detection methods is vast, and advances made in one scientific field can frequently be transferred to other disciplines, to the benefit of both parties involved.
 

This event brings together scientists from a range of scientific fields including computer science, statistics, particle physics and astrophysics, as well as cross-cutting areas such as the development of anomaly detection algorithms, medical image analysis, accelerator physics, and others. The goal of this workshop is to start a dialogue between different experts, fostering a collaborative environment where experiences, knowledge, and methodologies can be shared.
 

The workshop will explore topics related to, but not limited to: 

  • Novel machine learning algorithms for anomaly detection
  • Statistical techniques for detecting rare events
  • Anomaly detection for discovery of new phenomena in physics
  • Unveiling anomalies in astrophysical observations
  • Real-time anomaly detection in data streams and data acquisition
  • Integration of domain knowledge with machine learning approaches
  • Transverse applications including detectors, medicine, biology
     

Introductory presentations by leading experts of each topic will be followed by contributed presentations during the workshop.

 

Abstracts submissions are welcome until 29 January 2024 (extended).


The workshop will take place in Clermont-Ferrand at the Casino de Royat (see map).

 

Registration is free (but mandatory) and will be limited to 75 on-site participants. Talks are expected to be given in person and remote participation is not possible.

 

Social activities are also scheduled during the week.

 

Important dates:

 

16 October First announcement

6 NovemberSecond announcement, open registration and abstract submission

29 January – Deadline for abstract submission

5 February – Deadline for registration

12 February – Third announcement, program release

4-7 March – Workshop


This event is part of the IN2P3 AISSAI semester, held during 2023/2024.

 

A companion workshop on Uncertainties in fundamental physics will take place in Paris, 27th/Nov - 2nd/Dec, 2023.

Participants
67
    • 09:30 09:55
      Remarks: Opening
      • 09:30
        Welcome remarks, presentation AISSAI 25m
        Orateur: Julien Donini (UBP/LPC/IN2P3)
    • 09:55 10:40
      Invited: General review
      • 09:55
        A Statistician's View on Model-Independent Searches of New Physics at the Large Hadron Collider 45m

        A central goal in experimental high energy physics is to detect new signals that appear as deviations from known Standard Model physics in high-dimensional particle physics data. To do this, one seeks to determine whether there is a statistically significant difference between the distribution of Standard Model background samples and the distribution of the experimental observations, which are a mixture of the background and a potential new signal. Traditionally, this is done by assuming access to reliable samples from both the Standard Model background and the hypothesized signal distribution. In recent years, model-independent searches, where this assumption is relaxed, have gained widespread attention within HEP in order to increase the sensitivity of LHC experiments to unexpected new physics signals. In this talk, I will give an overview of the role that model-independent searches play at the LHC with a specific focus on the statistical and methodological challenges involved in these searches. I will focus on cases where these searches are performed in a high-dimensional space with the help of machine learning classifiers and draw connections with the wider literature on anomaly detection and two-sample testing.

        Orateur: Mikael Kuusela (Carnegie Mellon University)
    • 10:40 11:10
      Coffee break 30m
    • 11:10 12:25
      Contributed
      • 11:10
        RanBox: Anomaly Detection in the Copula Space 25m

        We describe RanBox, a tool for the discovery of overdensities in a standardized multi-dimensional space. The search is performed by reducing the feature space to the unit hypercube (copula), and searching for an excess of events within small multidimensional intervals exploiting the prediction of the background in multi-dimensional sideband around them. The algorithm is shown to compare favourably to supervised classifiers and to be applicable to a wide variety of situations.

        Orateur: tommaso dorigo (INFN Sezione di Padova)
      • 11:35
        Statistically learning dispersed new physics at the LHC 25m
        Orateur: Sabine Kraml (LPSC Granoble)
      • 12:00
        Summary of the AISSAI workshop on AI and the Uncertainty Challenge in Fundamental Physics 25m

        The AISSAI workshop on AI and the Uncertainty Challenge in Fundamental Physics https://indico.in2p3.fr/event/30589/ took place in Paris and Orsay 27-Nov to 1st Dec 2023. The following themes were covered Uncertainty Quantification, Explainable AI, Simulation-Based Inference, Data frugal approaches, Data-centric AI , Benchmarks dataset and challenges, Unfolding (or de-biasing, de-blurring) ,Controlling uncertainties in generative models and Architectures (Adversarial, Bayesian, ... ). In addition, a hackathon took place as a prototype of the Fair Universe competition to run as a NeuriPS competition in 2024. This talk will attempt to summarise the most salient points on recent developments shown in the different themes around uncertainty.

        Orateur: David Rousseau (IJCLab, CNRS/IN2P3, Université Paris-Saclay)
    • 12:25 12:40
      Group picture 15m
    • 12:40 14:00
      Lunch 1h 20m
    • 14:00 14:45
      Invited: Astronomy
      • 14:00
        Mysterious Lights: anomaly detection in astronomy 45m

        The discovery of unusual objects drives all scientific fields, and astronomy is no exception, given its diverse range of astrophysical phenomena. In the era of large sky surveys and machine learning, researchers designs automated pipelines to sift through data and identify objects that could enhance our understanding of the Universe.

        In this talk, I review the challenges and solutions the astronomy community faces in anomaly detection. I explore various domains, including gravitational waves and particle astrophysics, with a focus on electromagnetic radiation, spanning from radio to gamma rays. Electromagnetic data is represented in various forms, each requiring unique algorithms for analysis. I discuss anomaly detection for images, time-series, and spectral data. Lastly, I speculate about future astronomical surveys and how anomaly detection will aid the scientific community in uncovering rare and unusual celestial events.

        Orateur: Konstantin Malanchev (University of Illinois at Urbana-Champaign)
    • 14:45 15:10
      Contributed
      • 14:45
        Automatic detection of hostless transients in the FINK broker 25m

        Almost all astronomical transients are hosted by a galaxy. Hostless transients are rare and have been associated with events that probe the extremes of physical mechanisms. Early identification based on their hostless characteristics would allow rapid follow up and consequently better datasets for modelling and interpretation. Most apparently hostless events are not in fact hostless but their hosts are fainter than the limiting magnitude of the discovery survey. Thus, the detection of hostless events could also be used as an indirect mechanism to discover low surface brightness galaxies. Only a few true hostless events have been found to date resulting in an illustrative application for automatic anomaly detection techniques. We developed a pipeline that implements fairly simple anomaly detection methods built upon the experience of domain knowledge experts. We aim at detecting possible hostless transients using large surveys' alerts. We plan on implementing this pipeline into the FINK broker to be able to provide hostless candidates in real time. In this talk I will describe the science case and highlight features which should be kept in mind when dealing with real astronomical alerts. I will also show examples of a few hostless candidates and discuss how they can be used as templates to search for similar sources.

        Orateur: Priscila Pessi (Oskar Klein Centre)
    • 15:10 15:35
      Lightning talks
      • 15:10
        Enhancing Monojet searches with ML 4m

        Dark Matter particles could potentially be detected at the Large Hadron Collider (LHC) using the monojet channel, where at least one high pT jet recoils against missing transverse momentum. However, these searches pose a challenge as they require distinguishing subtle differences among similar jets. One way to improve this is by using Machine Learning (ML) methods to analyze correlations between jet constituents. I plan to share a proof-of-concept analysis employing a graph neural network to distinguish between the Standard Model background and signal from neutralino Dark Matter. I will provide the preliminary results obtained from evaluating this approach on MC simulated data.

        Orateur: Rafal MASELEK (LPSC (Grenoble))
      • 15:14
        Accelerating the search for mass bumps using the Data-Directed Paradigm 4m

        The Data-Directed paradigm (DDP) is a new physics search strategy for efficiently detecting anomalies in a large number of spectra with smoothly-falling SM backgrounds. Unlike the traditional analysis strategy, DDP avoids the need for a simulated or functional-form based background estimate by directly predicting the statistical significance using a convolutional neural network trained to regress the log-likelihood-based significance. In this way, a trained network is used to identify mass bumps directly on data. By saving a considerable amount of time, this approach has the potential to expand the discovery reach by checking many unexplored regions. The method has shown good performance when finding various beyond standard model particles in simulation data. A description of the method and recent developments will be presented.

        Orateur: Mlle Bruna Pascual (Université de Montréal)
      • 15:18
        Integrating Noisy Label Learning and Confidence Estimation 4m

        Training a neural network is challenging when the training dataset is contaminated by labelling errors, which are commonly referred to as label noise. This challenge often coexists with the challenge of predicting confidence, allowing one to flag low-confidence predictions for the main task. Existing techniques tackle one of the two challenges but not both, neglecting their interdependency. We establish a relationship between these challenges and propose a novel unified framework named Unsupervised Confidence Approximation (UCA) to address them concurrently. UCA trains a neural network simultaneously for its main task (e.g., image segmentation) and for confidence prediction, from noisy label datasets. Importantly, UCA can be trained without confidence labels and is thus prone to unsupervised training in this respect. UCA is generic as it can be used with any neural architecture designed for the main task. We evaluate UCA experimentally using the general CIFAR-10N dataset and the medical image datasets CheXpert and Gleason-2019. Incorporating UCA into existing networks enhances performance in both aspects of noisy label training and selective prediction. UCA-equipped networks are on par with the state-of-the-art in noisy label training when used in regular, full coverage mode. However, they have a risk-management facility, showing flawless risk-coverage curves with substantial performance gain over existing selective prediction methods.

        Orateur: Dr Navid Rabbani (DIA2M, DRCI, CHU Clermont-Ferrand)
      • 15:22
        A deep learning approach for videobased traffic anomaly detection 4m

        Traffic safety systems especially in critical infrastructure such as road tunnels have garnered the interest of researchers for many years. While traffic managers have used traffic surveillance cameras for some time now, recent advances in computer vision and understanding have enabled more sophisticated automatic incident detection systems. Most state of the art systems consist of modern vehicle detection technologies combined with heuristic based incident detection systems. Our work introduces an end-to-end anomaly detection method, free from manually set rules, making it more adaptable and robust. The main building block consists of a novel deep learning architecture designed to extract traffic flow relevant information from consecutive frames. Due to the high predictability of vehicle movements in highway driving situations the distribution of near future traffic information is highly predictable. In anomalous traffic situation this discrepancy of prediction and actual next frame representation exceeds the prediction error of normal traffic behavior. Compared to conventional systems, our proposed method showcases enhanced energy efficiency, potentially reducing operational costs, and exhibits superior performance in detecting a broader spectrum of anomalies.

        Orateur: M. Tom Schumann (RWTH Aachen University – Institute for Highway Engineering)
      • 15:26
        Anomaly Detection for Fink 4m
        Orateur: Maria Pruzhinskaya (LPC)
      • 15:30
        SEDAF : Prototype of a Real-Time Explainable Anomaly Detection System on Multivariate Data Stream 4m

        Anomaly detection refers to the identification of rare events that differ significantly from the normal trend observed in the data distribution. When the number of variables to analyze is large, it can be difficult to understand the detected anomaly without explanation. In this work, we present the prototype of an explainable and real-time anomaly detection system, based on measurements from a multivariate datastream which can be assimilated to an infinite multivariate time series. The built system is composed of a set of anomaly detection methods combining deep neural networks and decision trees as well as an agnostic explainability method. In an unsupervised learning context, we also show how explainability provides insights to validate the system.

        Orateur: JIECHIEU KAMENI Florentin Flambeau (CNRS-LIMOS)
    • 15:35 16:05
      Poster dedicated Coffee break 30m
    • 16:05 17:45
      Contributed
      • 16:05
        DASMA : Towards Real-time and Explainable Anomaly Detection on Data Stream 25m

        DASMA is a research project that aims at designing and building a system for real-time anomaly detection and explanation. The novelty is the abiltiy of the system to process a multivariate and numerical datastream in order to provide real-time explanations to anomalies detected by highlighting the variables mainly responsible for the anomaly. The prototype described in this work consists of a set of four anomaly detection algorithms based on decision trees and deep neural networks and a score attribution explanation method based on KernelSHAP. A windowing technique is used to slide over the datastream and update the model continuously. The built system leverages the InfluxData ecosystem consisting of InfluxDB, kapacitor and chronograf to respectively store, process and visualize the datastream in the form of a multivariate time series. The experiments conducted and validated by the domain experts have shown that the system is promising for real-time monitoring applications insofar as the user can visualize on the same dashboard, the anomalies and the explanations which provide insights to understand the anomalies detected. However there are still a lot of challenges to tackle including : continuous learning, adaptavile thresholding, managing de concept drift and so forth.

        Orateur: Dr JIECHIEU KAMENI Florentin Flambeau (CNRS-LIMOS)
      • 16:30
        Realtime Anomaly Detection with the CMS Level-1 Global Trigger Test Crate 25m

        We present the preparation, deployment, and testing of an autoencoder trained for unbiased detection of new physics signatures in the CMS experiment Global Trigger test crate FPGAs during LHC Run 3. The Global Trigger makes the final decision whether to readout or discard the data from each LHC collision, which occur at a rate of 40 MHz, within a 50 ns latency. The Neural Network makes a prediction for each event within these constraints, which can be used to select anomalous events for further analysis. The implementation occupies a small percentage of the resources of the system Virtex 7 FPGA in order to function in parallel to the existing logic. The GT test crate is a copy of the main GT system, receiving the same input data, but whose output is not used to trigger the readout of CMS, providing a platform for thorough testing of new trigger algorithms on live data, but without interrupting data taking. We describe the methodology to achieve ultra low latency anomaly detection, and present the integration of the DNN into the GT test crate, as well as the monitoring, testing, and validation of the algorithm during proton collisions.

        Orateur: Artur Lobanov (Universität Hamburg)
      • 16:55
        Learning new physics with a (kernel) machine 25m

        The New Physics Learning Machine is a methodology to perform a model-independent and multivariate likelihood ratio test powered by machine learning (arXiv:2305.14137). I will present its implementation based on kernel methods, which is extremely efficient while maintaining high flexibility (arXiv:2204.02317). After outlining the general framework, I will discuss recent results on model selection for improved chances of detection and present applications to model-independent searches of new physics, online data quality monitoring (arXiv:2303.05413), and the evaluation of generative models.

        Orateur: Marco Letizia (MaLGa Center, University of Genoa and INFN)
    • 19:00 21:00
      Welcome cocktail 2h
    • 09:30 10:15
      Invited: Interdisciplinarity
      • 09:30
        Blind Anomaly Detection in Industrial Time-series 45m

        In this talk, the specificities of anomaly detection in industrial time-series is investigated and the key related and often quite hidden concepts are introduced through illustrative examples. More importantly, the challenges associated to the characterization of normality of non cyclic state/context dependent time-series is underlined and the role of so-called dynamic invariants in addressing these challenges is invoked. Finally, general real-life use-cases induced comments are given regarding the relevant necessary metrics when it comes to evaluate and rank the related algorithms.

        Orateur: Dr Mazen Alamir (CNRS, Université de Grenoble Alpes, Grenoble-inp)
    • 10:15 10:40
      Contributed
      • 10:15
        Anomaly detection for complex equipment monitoring 25m

        Abstract (20 lines max) Anomaly detection is a key issue in many fields of application. In condition-based maintenance, we are navigating between two key issues: the health status assessment and the identification of the default (diagnosis itself). The first can be answered by an "unsupervised" approach, which seeks to detect changes in the behavior of complex industrial equipment, in order to detect faults or measurement problems as early as possible. The latter, on the other hand, implies supervised methods that are rarely applicable to complex equipment due to a lack of labeled data. The aim of this paper is to present a methodology for setting up a more suitable monitoring system than the simple detection of threshold exceedances usually used in maintenance. Due to the multiplicity of equipment to be monitored, the automatic anomaly detection process needs to prioritize the equipment to be analyzed and provide key indications to help experts for diagnosis. The main constraint is that the number of false alarms must not overload the control room, while at the same time not tolerating any missed detections. The method we have developed consists in learning and running a behavioral model based on an analysis of the distances between a new measurement and measurements from a reference period, which makes it close to k-NN type methodologies. This methodology, which meets the needs defined above, incorporates a process for automatically determining the end of the reference period, a definition of the adaptive distance metric, and the ability to reduce false alarms over time by taking continuous human analysts feedback into account.

        Orateur: Jean-Michel Becu (ACOEM)
    • 10:40 11:10
      Coffee break 30m
    • 11:10 12:25
      Contributed
      • 11:10
        Searching for changing-state AGNs in massive datasets with anomaly detection 25m

        The classic classification scheme for Active Galactic Nuclei (AGNs) was challenged by the discovery of the changing-state AGNs (CSAGNs). The physical mechanism behind this phenomenon is still a matter of open debate and the samples are too small and of serendipitous nature to provide robust answers. In this talk I will present an anomaly detection (AD) technique designed to identify AGN light curves with anomalous behaviors in massive datasets. The main aim of this technique is to identify CSAGN at different stages of the transition, but it can also be used for more general purposes. To test this algorithm, we used light curves from the Zwicky Transient Facility data releases (ZTF DR), containing a sample of 230,458 AGNs of different classes. The ZTF DR light curves were modeled with a Variational Recurrent Autoencoder (VRAE) architecture, that allowed us to obtain a set of attributes from the VRAE latent space that describes the general behaviour of our sample. These attributes were then used as features for an Isolation Forest (IF) algorithm. We used the VRAE reconstruction errors and the IF anomaly score to select a sample of 8810 anomalies. These anomalies are dominated by bogus candidates, but we were able to identify promising CSAGN candidates.

        Orateur: Paula Sanchez Saez (European Southern Observatory (ESO))
      • 11:35
        Leveraging Robust Machine Learning Methods for Targeted Anomaly Searches 25m

        Anomalies are usually framed as “rare objects”, lying in a low-density region of the feature space. However, finding them in practice under this broad definition can come with limitations: density estimation can be hard to perform reliably for high dimensional, noisy or complex (non-rectangular) data. Additionally, not all low-density points are interesting anomalies. In many cases, we are interested in a specific region of the feature space, looking for data points that diverge, in some precise aspects, from our expectations (e.g. if we have reliable models) or from otherwise similar data points. We can take advantage of robust, supervised ML methods for such anomaly searches without needing supervised anomalous examples. I will present results for the search of mid-infrared excess in FGK stars with a fully data-driven pipeline using Random Forests. To identify outlier candidates, we use a combination of the prediction errors and statistics using prediction errors of similar neighbouring points. This bypasses the need for accurate stellar models and fitting and provides a higher detection sensitivity, crucial in the mid-IR. This allows us to scale our search to an unprecedented data set of 4.9 million stars. Leveraging ML this way for targeted AD can be especially interesting in the absence of --good and cheap-to-compute-- models to scale to large datasets. It is important to note that our approach detects outliers from the data perspective: if IR excesses were very common in our sample (e.g. young stars) and could be predicted from the input features, the model would learn this and our pipeline would not flag anomalies.

        Orateur: Dr Gabriella Contardo (SISSA)
      • 12:00
        Looking for unique objects with excess UV emission in modern large-scale sky surveys 25m

        Selection of extreme objects in the data from large-scale sky surveys is a powerful tool for the detection of new classes of astrophysical objects or rare stages of their evolution. The cross-matching of catalogues and analysis of the color indices of their objects is a usual approach for this problem which has already provided a lot of interesting results. However, the analysis of objects that are found in only one of the surveys, and absent in all others, should also attract close attention, as it may lead to the discovery of both transients and objects with extreme color values.Here we report on our study aimed at the detection of objects with significant UV excess in their spectra by cross-matching of the GALEX all-sky catalogue with the data from optical large scale experiments, especially from the Dark Energy Survey, and analyzing the ones visible in GALEX only, or having extreme UV to optical colors. We describe the methodology for such investigation, discuss the obstacles and artefacts that may mimic such extreme objects, and present the results of the study covering the significant part of the Southern sky.

        Orateur: Mlle Aleksandra Avdeeva (Institute of Astronomy RAS)
    • 12:25 14:00
      Lunch 1h 35m
    • 14:00 14:50
      Contributed
      • 14:00
        Anomaly Detection algorithms applied to the Quality Control of detector components 25m

        With the rapid development of advanced Machine Learning techniques, many new and efficient Anomaly Detection algorithms have been released. In the field of experimental High Energy Physics, interest for such Anomaly Detection algorithms is growing. In the production of new detectors, one of the critical aspects is to ensure the functioning of each component through extensive Quality Control procedures. Our proposal is to use advanced Anomaly Detection algorithms to improve the efficiency and reliability of such procedures. In particular, we focus on the Visual Inspection of detector components using Computer Vision algorithms. We established a strategy combining both unsupervised and supervised techniques in order to detect all kinds of defects in a given component. This strategy is demonstrated in the context of the upgrade of the ATLAS detector for the High Luminosity phase of the LHC. We use the components of the new ATLAS Inner Tracker (ITk) produced in Japan as a first test case. In future, we believe that such techniques can be generalized for the construction of many new experiment in High Energy Physics as well as in other fields.

        Orateur: Dr Louis Vaslin (QUP/KEK)
      • 14:25
        Identification of anomalous epochs in Astronomical Time Series through Transfer Learning 25m

        We present a novel method for detecting outliers in astronomical time series based on the combination of a deep neural network and a k-nearest neighbor algorithm. We use an EfficientNet network pre-trained on ImageNet as a feature extractor, and then perform a k-nearest neighbor search in the resulting feature space to measure the distance from the first neighbor for each image. If the distance is above the one obtained for a stacked image, we flag the image as a potential outlier. We apply our method to the VST time series, which are obtained from the VLT Survey Telescope (VST), a 2.6-meter optical telescope located at Paranal Observatory in Chile. We show that our method can effectively identify and remove artifacts from the VST time series, and improve the quality and reliability of the data. This method can be very useful in sight of the Vera C. Rubin Legacy Survey of Space and Time. We also discuss the advantages and limitations of our method, and suggest possible directions for future work.

        Orateur: Stefano Cavuoti (INAF - Astronomical Observatory of Capodimonte Napoli)
    • 14:50 15:35
      Tutorial: Generative Anomaly Detection - Part 1
      • 14:50
        Lecture and Tutorial on Generative Anomaly Detection 45m

        Recent progress in generative models allows the unsupervised learning and interpolation of high-dimensional distributions. This, for example, makes possible approaches for anomaly detection based on weak supervision by first interpolating into a region with potential signal present, and then training a classifier to find such a potential signal. This lecture and tutorial will introduce the necessary ingredients based on hands-on examples and discuss current challenges and progress.

        Orateur: Gregor Kasieczka (Universität Hamburg)
    • 15:35 16:05
      Coffee break 30m
    • 16:05 17:20
      Tutorial: Generative Anomaly Detection - Part 2
      • 16:05
        Lecture and Tutorial on Generative Anomaly Detection 1h 15m

        Recent progress in generative models allows the unsupervised learning and interpolation of high-dimensional distributions. This, for example, makes possible approaches for anomaly detection based on weak supervision by first interpolating into a region with potential signal present, and then training a classifier to find such a potential signal. This lecture and tutorial will introduce the necessary ingredients based on hands-on examples and discuss current challenges and progress.

        Orateur: Gregor Kasieczka (Universität Hamburg)
    • 09:30 10:40
      Tutorial: Active Anomaly Detection - part 1
    • 10:40 11:10
      Coffee break 30m
    • 11:10 12:00
      Tutorial: Active Anomaly Detection - part 2
    • 12:00 12:25
      Contributed
      • 12:00
        Searching for rare objects with narrow-band photometry from S-PLUS 25m

        In this talk I will present the Southern Photometric Local Universe and its fourth data release (DR4) and briefly show our past and ongoing machine learning projects using this data. S-PLUS will cover ~9300 square degrees of the southern sky with an 80-cm telescope (T80-South) located in the Cerro Tololo Inter-American Observatory. The observations are taken in 12 bands: 5 sloan-like broad bands and 7 narrow bands centered in stellar features. The additional information of the narrow-band photometry increases the performance of machine learning algorithms for a variety of tasks, such as object classification (Nakazono et al. 2021) and photometric redshift estimation (Nakazono & Valença, submitted; Lima et al. 2021). At this moment, S-PLUS is the survey that has released data for the largest area of the southern sky (~3000 square degrees in DR4) with the highest number of narrow bands. I will discuss the possibilities of detecting short-period variables (such as white dwarfs) and our effort in detecting high-redshift quasars using anomaly detection techniques. Finally, I will mention the possibility of using the T80-South facilities for observing Target of Opportunity to obtain fast photometric SEDs.

        Orateur: Mlle Lilianne Nakazono (University of Sao Paulo)
    • 12:25 14:00
      Lunch 1h 35m
    • 14:00 14:45
      Invited: Particle Physics
      • 14:00
        Anomaly detection in particle physics 45m

        Anomaly detection is mostly considered in searches for either outlier events or accumulation of events disagreeing with a priorly known distribution. The emergence of advanced machine learning (ML) techniques opens many new opportunities of detecting anomalies also in collider experiments. I will review traditional as well as state of the art ML-based anomaly detection concepts and algorithms as well as scientific results obtained applying them to collider data.

        Orateur: Dr Shikma Bressler (Weizmann Institute of Science)
    • 14:45 15:35
      Contributed
      • 14:45
        DarkMachines: data challenge and anomaly scores 25m
        Orateur: Pietro Vischia (Universidad de Oviedo and Instituto de Ciencias y Tecnologías Espaciales de Asturias (ICTEA))
      • 15:10
        Model-agnostic search for dijet resonances with anomalous jet substructure with the CMS detector 25m

        We present a model-agnostic search for new physics in the dijet final state using five different novel machine-learning techniques. Other than the requirement of a narrow dijet resonance, minimal additional assumptions are placed on the signal hypothesis. Signal regions are obtained utilizing multivariate machine learning methods to select jets with anomalous substructure. A collection of complimentary methodologies -- based on unsupervised, weakly-supervised and semi-supervised paradigms -- are used in order to maximize the sensitivity to unknown New Physics signatures.

        Orateur: Louis Moureaux (Universität Hamburg)
    • 15:35 16:05
      Coffee break 30m
    • 16:05 16:55
      Contributed: TBD
      • 16:05
        Advanced anomaly detection algorithms to search for semivisible jets in the CMS experiment at the CERN LHC 25m

        Semivisible jets are an intriguing signature predicted to arise at hadron colliders when the Standard Model /(SM) of particle physics is extended with a new, hidden sector, governed by a confining interaction. Made of a mixture of SM particles and undetectable bound states of new particles, semivisible jets present a unique radiation pattern. Exploiting the resulting differences in jet substructure compared to SM jets is key in the search for strongly-coupled dark sectors. Advanced anomaly detection tools provide a powerful way to achieve this discrimination and, compared to supervised machine learning strategies, have the advantage of not being reliant on a finite set of signal model hypotheses to train on. In this talk, we present the application of normalized autoencoders to the task of separating semivisible jets from SM backgrounds. We show how this architecture drastically improves the performance in detecting signal events compared to a standard autoencoder. Finally, we demonstrate a failure mode in the training of normalized autoencoders, and propose a novel procedure to optimize the performance of the normalized autoencoder in a fully signal agnostic fashion using the Wasserstein distance.

        Orateur: Dr Roberto Seidita (ETH Zürich)
      • 16:30
        Anomaly Detection at the LHC with GAN-AE algorithm 25m
        Orateur: Louis Vaslin (LPC Clermont)
    • 17:30 18:00
      Transport to city center 30m
    • 18:00 19:30
      Guided tour to Clermont Ferrand 1h 30m
    • 19:30 22:30
      Conference dinner 3h
    • 09:30 10:15
      Invited: Medical Imaging
      • 09:30
        A review of unsupervised anomaly detection models for neuroimaging applications 45m

        Statistical machine learning models have becoming state-of-the-art methods in almost all medical imaging applications, including the segmentation of organs or structures of interest and the detections of pathological patterns. Among data-driven methods, fully supervised models remain the most common and performing ones. However, gathering numerous expert-annotated data to train such models is a time- and resources-intensive consuming process, limiting the size of available databases. Unsupervised Anomaly Detection, also referred to as outlier detection, has been proposed as an alternative to deep supervised learning for medical image analysis when the studied pathology is either rare or with heterogeneous patterns as well as when getting labels from radiologists is very challenging. This presentation will review the state of the art in this field focusing on neuroimaging applications.

        Orateur: Dr Carole Lartizien (CREATIS laboratory)
    • 10:15 10:40
      Contributed
      • 10:15
        Real-time Anomaly Detection in Injection Molding: Leveraging Autoencoder Models To Define The Future Of Quality Control 25m

        Injection molding, especially in medical device manufacturing, faces significant costs associated with manual quality control, largely due to regulatory requirements. Standard approaches using Design of Experiments combined with manual control are limited by performance, high cost and delayed detection, while other approaches like Statistical Process Control are limited by extensive need for labeled data and poor adaptability to natural variations of manufacturing floors, alongside challenges in handling injection molding data complexity, therefore leading to poor detection or high false alarm rates. Addressing these issues, this research adopts unsupervised autoencoder models for cost efficient, novel, real-time anomaly detection in injection molding. By leveraging a real-time cloud pipeline to analyze the model's reconstruction error, the system effectively identifies normal and abnormal components. The model successfully detected 7 out of 9 generated fault scenarios, achieving detection of 91% of visual and 100% of dimensional defects. This method, overcoming the class imbalance challenge of rare event problems, can then be used to cost effectively label abnormal components to tune the system to a needed performance, paving the foundation for automated quality assurance and parametric component release.

        Orateur: Florian Josselin (SHL Medical)
    • 10:40 11:10
      Coffee break 30m
    • 11:10 12:25
      Contributed
      • 11:10
        M-dwarf flares in the Zwicky Transient Facility data and what we can learn from it 25m

        M-dwarf stars make up a vast majority of stars in the Milky Way galaxy. As low-mass, fully convective stars, they exhibit frequent flaring events caused by powerful magnetic reconnection processes in their atmospheres. The study of M-dwarf flares gives key insights into stellar magnetism, high-energy phenomena, and the impacts on potential habitable planets orbiting these stars. In this work we applied an Active Anomaly Discovery (AAD) algorithm to search for M-dwarf flares in the Zwicky Transient Facility data releases. AAD represents an active machine learning technique which sequentially uses expert feedback in order to fine tune an initially standard unsupervised algorithm to a particular definition of scientifically interesting anomaly. Since the algorithm can adapt to the expert’s opinion, it can be used for a targeted search of objects of a certain type. Therefore, in this analysis, a human expert considered only M-dwarf flares candidates as anomalies; all other objects proposed by the algorithm are rejected by the expert as nominals. We compared flares found with AAD with the ones detected by a parametric fit search. Both methods allowed to discover 126 M-dwarf flares. Also, the additional astrophysical analysis has been performed: measuring the flare energies and defining the spectral (sub)classes of dwarf stars.

        Orateur: Anastasiia Voloshina (SNAD team)
      • 11:35
        Signatures to help interpretability of anomalies 25m

        Machine learning is often viewed as a black box when it comes to understanding its output, be it a decision or a score. Automatic anomaly detection is no exception to this rule, and quite often the human expert is left to independently analyze the data in order to understand why it is tagged as an anomaly. Worst, the expert may end up scrutinizing over and over the same kind of rare phenomena which all share a high anomaly score, while missing anomalies of interest. In this presentation, I’ll introduce the idea of anomaly signature, whose aim is to help the interpretability of an anomaly score, by highlighting which features contributed to the decision. Similar in spirit to feature importance for classification, anomaly signatures can also be used to improve the feature selection to define anomalies. I’ll present applications to the search of anomalies in astrophysics within the framework of the SNAD team.

        Orateur: Emmanuel Gangler (LPC)
      • 12:00
        Multiview Symbolic Regression. How to learn laws from examples 25m

        Feature extraction is one of the crucial stages in trying to apply machine learning to real scientific cases. Because of their potentially non homogeneous sampling, feature extraction of time series represents an additional challenge. One of the standard methods to tackle it requires to use a parametric equation which should be versatile enough to describe all the diversity of your dataset. This function is then used to fit each time series and the best parameter values are used as features. The quality of the parametric function determines how much of the data’s behavior is encoded inside the features and thus determines how suitable they are for machine learning applications. In particular in the case of anomaly detection it is essential to use the best possible function since small details can make a significant difference between a normal and an anomalous event. In this work we propose a solution to automatically discover optimal parametric functions for a given problem using an adaptation of Symbolic Regression. It is capable of recovering a common parametric equation hidden behind multiple datasets generated using different parameter values. We call this approach Multiview Symbolic Regression (MvSR). I will highlight the potential of MvSR for feature extraction by demonstrating its efficiency on a variety of real scientific datasets. The resulting parametric equations are able to correctly describe the examples from which they were built as well as other unseen similar examples. Applying MvSR on a science case will unlock it’s optimal parameterization for future anomaly detection pipelines, thus improving chances of future great discoveries

        Orateur: Etienne Russeil
    • 12:25 14:00
      Lunch 1h 35m
    • 14:00 15:15
      Contributed
      • 14:00
        Exploiting the discovery potential of the LHC data using the Data Directed Paradigm 25m

        No stone can be left unturned in the search for new physics beyond the standard model (BSM). Since no indication of new physics was found yet, and the resources in hand are limited, we must devise novel avenues for discovery. We propose a Data-Directed Paradigm (DDP), whose principal objective is to direct dedicated analysis efforts towards regions of data which hold the highest potential for discoveries leading to BSM physics. The DDP is a different search paradigm, in complete contrast but complementary to the currently dominant theory-driven blind analysis search paradigm. It could reach discoveries that are currently blocked by the waste of resources involved in the blind analysis dogma. After investing hundreds of persons-years, impressive bounds on BSM scenarios have been set. However, this paradigm also limited the number of searches conducted, leaving large potential of the data unexplored. One representative example is that of the search for di-lepton resonances, where searches targeting exclusive regions of the data (di-lepton+X) are hardly conducted. Focusing on the Data, the DDP allows identifying rapidly whether the data in a given region exhibit significant deviations from a well-established property of the Standard Model (SM). Thus, ideally, an unlimited number of final states can be tested, expanding considerably our discovery reach. We discuss DDP implementations for two SM properties. The first is the fact that in absence of resonances, most invariant mass distribution are smoothly falling. Along the di-lepton example, we propose identifying which of the many di-lepton+X selections is more likely to hide a resonance. The second property is the flavour symmetry of the SM, the fact that, in absence of BSM physics, the LHC data should be approximately symmetric to the replacement of prompt electrons with prompt muons.

        Orateur: Shikma Bressler (Weizmann Institute of Science)
      • 14:25
        Accelerating the search for mass bumps using the Data-Directed Paradigm 25m

        The Data-Directed paradigm (DDP) is a new physics search strategy for efficiently detecting anomalies in a large number of spectra with smoothly-falling SM backgrounds. Unlike the traditional analysis strategy, DDP avoids the need for a simulated or functional-form based background estimate by directly predicting the statistical significance using a convolutional neural network trained to regress the log-likelihood-based significance. In this way, a trained network is used to identify mass bumps directly on data. By saving a considerable amount of time, this approach has the potential to expand the discovery reach by checking many unexplored regions. The method has shown good performance when finding various beyond standard model particles in simulation data. A description of the method and recent developments will be presented.

        Orateur: Mlle Eva Mayer (Université Clermont Auvergne)
      • 14:50
        Anomaly detection for data quality monitoring of the CMS detector 25m

        Ensuring the quality of data in large HEP experiments like CMS at the LHC is of primary importance to ensure solid physics results. Well established Data Quality Monitoring (DQM) and Data Certification (DC) procedures at CMS presently rely on the visual inspection of a set of reference histograms providing a concise overview of the detector status and performance. Besides the required person-power, the main limitation of such procedures is the coarse time granularity, potentially hiding transient issues. In this contribution we will discuss recent developments of automatised DQM and DC workflows using autoencoders, where models specifically conceived for different CMS sub-detectors can spot detector anomalies with high accuracy and fine time granularity.

        Orateur: Dr Federica Maria Simone (Università and INFN, Bari, Italy)
    • 15:15 15:40
      Remarks: Closing