IWAPP - Innovative Workflows in Astro- & Particle Physics

Europe/Paris
Online

Online

https://fau.zoom.us/j/92555464859?pwd=RHlHUERjeVRRMWJHVnJoSk1BZ0VHUT09
Elena Cuoco
Description

Innovative workflows - the agenda

The use of machine and deep learning techniques has become increasingly common in scientific communities. The large amount of data to analyze and the need for real-time analysis are driving the search for cutting-edge techniques and new data analysis paradigms.

Especially the field of deep learning for event-based, image-based and signal-based applications of two and higher dimensions has been intensively followed by expert groups within the ESFRIs and will be deployed to the EOSC via ESCAPE.

The objective of this workshop is to bring together the scientists' communities of Astrophysics, Astroparticle Physics and Particle Physics who are leading the development of Innovative Workflows within their domain and explore common approaches for innovative workflows.

Interactive format and contributions

The workshop aims to increase exchange and discussions between the participants. In addition to the presentations in the main sessions, all registrants are therefore invited to share their work in open interactive sessions.

Technical support and uploads

For any questions on technical support, please contact escape.iwapp@gmail.com.

If you would like to upload material to your contribution and cannot access your contribution, you can also drop your files (named name_contribution.format, eg. Jones_presentation.pdf) in this folder.

Participants
  • Achim Stahl
  • Adrián Ayala-Gómez
  • Alberto Iess
  • Alessandro Lonardo
  • Alex Clarke
  • Antoine LEMASSON
  • Antonio D'Isanto
  • Argyro Sasli
  • Arsenii Gavrikov
  • Axel Donath
  • Barbara Patricelli
  • Bernardino Spisso
  • CANİP SEVİNÇ
  • Caterina Doglioni
  • Claire Adam
  • Cristiano Bozza
  • Daniel Nieto Castaño
  • Davit Janezashvili
  • Dmytro Kresan
  • Elena Cuoco
  • Fabian Schussler
  • Filip Morawski
  • Filippo Quarenghi
  • ge ou
  • Gernot Maier
  • Giorgi Kistauri
  • Giuseppe Fiameni
  • Giuseppe Lo Presti
  • Gogita Papalashvili
  • Guillaume Baulieu
  • Hendrik Heinl
  • Inigo Slijepcevic
  • Isabel Cordero-Carrión
  • Javier Moldon
  • Javier Pascual Granado
  • Jayesh WAGH
  • jigar bhanderi
  • Johannes Schumann
  • Jose Agustin Lozano Torres
  • José Ramón Rodón Ortiz
  • Julián Garrido-Sánchez
  • Jutta Schnabel
  • Kai Polsterer
  • Kay Graf
  • Laura Darriba
  • Luca Antiga
  • Luca Pontisso
  • Maisam M. Dadkan
  • Manuel Parra
  • Marco Cavaglia
  • Maria Angeles Mendoza Perez
  • Marie Paturel
  • Mark Allen
  • Mathieu Servillat
  • Matias Tueros
  • Matteo Turisini
  • Mauricio Santillana
  • Maurizio Pierini
  • Maxim Gonchar
  • Maxime Fays
  • Micah Bowles
  • Mieke Bouwhuis
  • Muhammad Aleem Sarwar
  • Nazaret Bello González
  • Nikola Lopac
  • Nima Sedaghat
  • Olivier Stezowski
  • Osman Tayfun Bişkin
  • Panagiotis Iosif
  • Pierre Chanial
  • Rasa Muller
  • Raviraditya Singh
  • Reda Attallah
  • Rezo Shanidze
  • Simona Maria Stellacci
  • Stefan Reck
  • Susana Sánchez Expósito
  • Tamas Gal
  • Thomas Eberl
  • Thomas Vuillaume
  • Thorsten Glüsenkamp
  • Tjark Miener
  • V.N. Pandey
  • Ville Tenhunen
    • 10:00 12:45
      Introduction: Overview presentations

      Introduction and overview presentations on innovative workflows

      Président de session: Elena Cuoco
      • 10:00
        Introduction & Overview 10m
        Orateur: Elena Cuoco
      • 10:10
        OSSR & ESCAPE 10m
        Orateur: Kay Graf (ECAP - University of Erlangen)
      • 10:20
        Technical introduction 10m
        Orateur: Jutta Schnabel (FAU Erlangen (ECAP))
      • 10:30
        AI at scale 1h

        Deep Learning has been the most significant breakthrough in the past 10 years in the field of pattern recognition and machine learning. It has achieved significant advancements in terms of the effectiveness of prediction models on many research topics and application fields, ranging from computer vision, natural language processing, embodied AI and to more traditional fields of pattern recognition. This paradigm shift has radically changed the research methodology towards a data-oriented approach, in which learning involves all steps of the prediction pipeline from feature extraction to classification. While research efforts have concentrated on the design of effective feature extraction and prediction architectures, computation has moved from CPU-only approaches to the dominant use of GPUs and massively parallel devices, empowered by large-scale and highly dimensional datasets.

        The goal of this talk is to present recent advancements in AI and techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications.

        Giuseppe Fiameni is a Solution Architect at NVIDIA where he oversees the NVIDIA AI Technology Centre in Italy, a collaboration among NVIDIA, CINI and CINECA to accelerate academic research in the field of Artificial Intelligence through collaboration projects. He has been working as HPC specialist at CINECA, the largest HPC facility in Italy, for more than 14 years providing support for large-scale data analytics workloads.

        Orateur: Giuseppe Fiameni (NVIDIA)
      • 11:30
        Coffee break 15m https://gather.town/app/olLhUruftJ617h4z/IWAPP (Gather)

        https://gather.town/app/olLhUruftJ617h4z/IWAPP

        Gather

      • 11:45
        Deep Learning for real-time data processing at the LHC 1h
        Orateur: Maurizio Pierini (CERN)
    • 15:00 17:00
      Hands-On: Bring your own container - open interactive session
      Président de session: Dr Kay Graf (ECAP - University of Erlangen)
      • 15:00
        Astro-COLIBRI: a new platform for time-domain astronomy 2h colab space (Gather)

        colab space

        Gather

        https://gather.town/app/olLhUruftJ617h4z/IWAPP

        Astro-COLIBRI: The coincidence library for real-time inquiry for multi-messenger astrophysics

        Flares of known astronomical sources and new transient phenomena occur on different timescales, from sub-seconds to several days or weeks. The discovery potential of both serendipitous observations and multi-messenger and multi-wavelength follow-up observations could be maximized with a tool which allows for quickly acquiring an overview over both persistent sources as well as transient events in the relevant phase space. We here present COincidence LIBrary for Real-time Inquiry (Astro-COLIBRI), a novel and comprehensive tool for this task.

        Astro-COLIBRI's architecture comprises a RESTful API, a real-time database, a cloud-based alert system and a website as well as apps for iOS and Android as clients for users. The structure of Astro-COLIBRI is optimized for performance and reliability and exploits concepts such as multi-index database queries, a global content delivery network (CDN), and direct data streams from the database to the clients to allow for a seemless user experience. Astro-COLIBRI evaluates incoming VOEvent messages of astronomical observations in real time, filters them by user specified criteria and puts them into their MWL and MM context. The clients provide a graphical representation with an easy to grasp summary of the relevant data to allow for the fast identification of interesting phenomena and provides an assessement of observing conditions at a large selection of observatories around the world.

        The platform is currently in its beta phase. We'll present the current features and outline future improvements.

        Orateur: Fabian Schussler (CEA/Irfu)
      • 15:00
    • 09:30 12:45
      Workflows: Specific approaches

      Specific workflow solutions

      Président de session: Cristiano Bozza
      • 09:30
        MEGAVIS - Real-time spectra analysis and visualization with autoencoders 30m

        The data explosion in astronomy requires the development of new techniques both from the infrastructure and from the analysis side. In particular, the increase of the data complexity demands a parallel effort to deliver efficient and standardized solutions for accessing and managing data, tools and software. This is the main purpose of ESCAPE. In this talk I will give an overview of the work fulfilled within the project, presenting MEGAVIS, a prototype based on machine learning, which aims to start building a new paradigm for data access and search, not based on explicit criteria but implicitly, looking at similarities. The prototype is based on dimensionality reduction models, and in particular on an autoencoder. The main features and capabilities of the software will be illustrated, and the possibilities of future developments.

        Orateur: Dr Antonio D'Isanto (HITS gGmbH)
      • 10:00
        APEIRON 30m

        APEIRON is an INFN Scientific Committee 5 funded project aimed at designing and developing a framework to study, prototype and deploy AI-based real-time processing apparatuses boosting particle identification capabilities in trigger systems or performing efficient online data reduction for triggerless ones.
        It involves the definition of the general architecture of a heterogeneous distributed execution platform along with its software stack and a set of relevant use cases to validate it.
        NA62 at CERN is a fixed target experiment on ultra-rare kaon decays and represents the main use case for APEIRON.
        High particle rates and prompt online data selections are the pillars of its experimental strategy given that the physics signal of interest is ten orders of magnitude less frequent than the background.
        To upgrade the trigger and data acquisition system of NA62 we are building shallow neural networks that extract high level features from a Ring Imaging Cherenkov detector (RICH) and we are testing them for real-time inference on FPGA using HLS technique.
        Fully connected and convolutional architectures have been explored getting different performances in terms of classification accuracy, computational latency and digital resources utilization on the target FPGA device.
        In this short talk we briefly introduce the models, present the status of the project and sketch the perspectives of our work.

        Orateurs: Luca Pontisso, Matteo Turisini (INFN)
      • 10:30
        Real-time analysis in high energy physics and beyond 30m

        The Large Hadron Collider collides protons up to 30 million times a second, and provides its experiments with an enormous amount of data. The trigger systems of each experiment quickly analyse and decide whether to retain each of those collision events from the LHC for further analysis, on a timescale of the order of milliseconds. In this seminar, I will present/discuss an overview of the tools and real-time analysis techniques employed within these trigger systems, focusing on the ATLAS experiment but also outlining elements of the strategies of the CMS and LHCb experiments. I can also present/discuss connections of those techniques physics cases that use novel techniques to make the most of LHC data with a sensitivity that would not be achievable with standard techniques. I would be happy to discuss connections beyond high energy physics.

        Orateur: Caterina Doglioni (Lund University)
      • 11:00
        Coffee break 15m Gather

        Gather

        https://gather.town/app/olLhUruftJ617h4z/IWAPP
      • 11:15
        Some benefits of using normalizing flows for PDF modelling 30m
        Orateur: Thorsten Glüßenkamp
      • 11:45
        A machine learning workflow for reproducible data science 1h

        The advent of new machine learning approaches to astrophysics and particle physics comes with the challenge
        of identifying a set of tools that can support machine learning-driven workflows for data analysis in this domain.
        The list of available tools is growing by the day, and it can be challenging to identify a good starter set of tools.
        In this talk we will explore a few building blocks of a reproducible data pipeline for data versioning, model development,
        experiment tracking and model serving, such as Hangar, PyTorch Lightning, MLFlow and RedisAI.
        We will provide an introductory overview of the each of these tools and the respective advantages. Finally, we
        will present how we are leveraging these in the context of the ESCAPE project.

        Orateur: Luca Antiga
    • 15:00 16:30
      Hands-On: Plenary
      Président de session: Kay Graf (ECAP - University of Erlangen)
    • 09:30 12:30
      Workflows: Solutions in Research Infrastructures

      Specific workflow solutions

      Président de session: Thomas Eberl (ECAP - Uni Erlangen)
      • 09:30
        Machine Learning Algorithms in LIGO-Virgo 30m

        Machine learning techniques are increasingly popular in gravitational-wave physics. I will introduce some of the current applications in the field, from detector noise characterisation to data analysis techniques and highlight areas of active development with strong innovative potential for current-generation detectors.

        Orateur: Maxime Fays
      • 10:00
        Anomaly Detection in Gravitational Waves data using Convolutional AutoEncoders 20m
        Orateur: Filip Morawski (Nicolaus Copernicus Astronomical Center of the Polish Academy of Sciences)
      • 10:20
        Science data challenges for the SKA 30m

        I will talk about some of the work we have been doing at the Square Kilometer Array (SKA) creating data challenges that help prepare users for when the SKA comes online. The focus of these is to give users data similar to that expected from the telescope, and ask them to implement workflows that produce science outputs. The first data challenge involved identifying and classifying sources in images, and we developed an example containerised solution promoting best practices in machine learning, software development and reproducibility. I'll highlight some parts of this, and discuss some of the machine learning methods that were most useful.

        Orateur: Alex Clarke
      • 10:50
        Coffee break 20m https://gather.town/app/olLhUruftJ617h4z/IWAPP (Gather)

        https://gather.town/app/olLhUruftJ617h4z/IWAPP

        Gather

      • 11:10
        Machine Learning Workflows in KM3NeT 30m

        KM3NeT consist of two water-Cherenkov neutrino detectors currently under construction in the Mediterranean Sea:
        the low energy site ORCA in France, as well as the high energy site ARCA in Italy. ORCA's goal is the determination of the neutrino mass hierarchy by measuring the energy- and zenith-angle-resolved oscillation probabilities of atmospheric neutrinos traversing the Earth. ARCA will use its wide coverage of the observable sky to look for high-energy astrophysical neutrino sources. Machine Learning algorithms play an important role in analysing the signatures induced by particles traversing the detectors.

        This talk will give a detailed explanation of the types of data in KM3NeT, the challenges we faced when analysing it, and the various machine learning based solutions and workflows that were developed over the years. The topics range from maximum-likelihood-based reconstruction algorithms accompanied by shallow machine learning techniques like Random Forests, up to the use of deep artificial neural networks and graph-based networks.

        Orateur: Stefan Reck (ECAP - University of Erlangen)
      • 11:40
        Incorporating deep learning into the analysis of the Cherenkov Telescope Array 30m
        Orateur: Tjark Miener
      • 12:10
        GNA: data flow approach for the neutrino oscillation experiments 20m

        GNA is a framework dedicated for building and fitting large scale models related to the neutrino oscillation physics. The core is written in C++ and operated from within Python. Following the data flow paradigm the model is built as a directed acyclic graph. Each node of the graph represents a function that operates on a vectorized data and depends on a few parameters. Any part of the graph is evaluated lazily. A library of transformations, implementing various functions, including vectorized integration and interpolation, is precompiled. While the approach is ideologically similar to the one, used within ML field (TensorFlow/zfit, PyTouch), one of the key differences is a requirement to build non-uniform models containing a large number of elements. The framework enables the user to build incomplete and independent parts of the model and combine them into a single graph, by describing its structure as a mathematical expression. The description of the framework as well as practical examples from the JUNO experiment will be presented.

        Orateurs: Maxim Gonchar (JINR), Maxim Gonchar (JINR)
    • 14:00 16:00
      Workflows: Open interactive session

      Specific workflow solutions

      • 14:00
        ALFA / FairRoot Framework 2h
        Orateur: Dmytro Kresan (GSI)
      • 14:00
        The OSSR 2h

        This contribution demonstrates how you can share your own work during the open poster session, by sharing a webpage or poster pointing to your work. Please provide links or uploads through the form "Open session contribution"

        Orateur: Jutta Schnabel (FAU Erlangen (ECAP))
    • 17:00 18:00
      Evening talk: Finding common ground

      Invited talk

      • 17:00
        Machine Learning in Healthcare 45m
        Orateur: Mauricio Santillana (Boston Children's Hospital / Harvard Medical School)
    • 09:30 10:00
      Forum
      Président de session: Thomas Vuillaume (LAPP, CNRS)
      • 09:30
        KM3NeT Tier-2 Computing at Tbilisi State University 7m
        Orateur: Gogita Papalashvili
      • 09:37
        New approaches for multi-messenger real time analysis 7m
        Orateur: Barbara Patricelli (EGO)
      • 09:44
        Classifying galaxies, quasars and stars in optical surveys with machine learning 7m

        We implemented a random forest and a dimension reduction algorithm called UMAP to classify and analyse 111 million sources from the Sloan Digital Sky Survey (SDSS) based on their photometry. These sources did not have the spectroscopic observations traditionally required for classification, but our method still enabled us to find 2 million new active supermassive black holes.

        Orateur: Alex Clarke
      • 09:51
        Distributed data processing using FairMQ framework. 7m
        Orateur: Dmytro Kresan (GSI)
    • 10:00 12:00
      Common approaches: Presentations

      Discussions and talks on combined workflows

      Président de session: Dr Thomas Vuillaume (LAPP, CNRS)
    • 15:00 16:45
      Common approaches: Discussion groups

      Discussions and talks on combined workflows

      Président de session: Jutta Schnabel (FAU Erlangen (ECAP))
      • 15:00
        Discussion 1: Data reduction and data formats 45m

        With the exponential increase in available data, new data analysis paradigms are required. The tradition single source science approaches used for nowadays data archives do not scale with recent data challenges. Machine learning is one of the key tools to provide an assistance in dealing with this data avalanche. This brings forth new topics to be dealt with:
        - pre-processing and compressed representations of data
        - from data formats to data interface
        - metadata, provenance, versioning, snapshots and others

        Orateur: Kai Polsterer (HITS gGmbH)
      • 15:00
        Discussion 2: use of alternative hardware 45m

        With new technologies emerging (e.g., deep learning), scientific computing environment are becoming more and more heterogeneous. Several parallel computing devices (GPGAs, GPUs, etc) can be exploited to accelerate traditional algorithms and include deep learning components in the processing workflows. A wide set of dedicated computing devices (TPUs, IPUs, ...) are targeting specific use cases, e.g., convolutional neural networks, recurrent networks, graph networks. Neuromorphic computing opens the possibility to use spiking neural networks for signal processing. On a longer term, quantum computing might offer interesting alternatives to solve large combinatoric problems. With private companies dictating the direction followed by innovation, big scientific collaborations might have to adapt their data processing to follow this trend, which could offer specific advantages.

        Orateur: Maurizio Pierini (CERN)
      • 15:00
        Discussion 3: Machine & deep learning techniques 45m

        The adoption of deep learning and machine learning techniques in the scientific community has become increasingly widespread. Sharing knowledge and code on approaches and architectures is already become practice, but this could be improved by creating common resources that require limited effort but may have a high impact on the field, by both enabling new science and ensuring prior work can be reproduced. This can span a range of possibilities, starting from well-maintained collections of recent work and related code, to the collaboration on specific modelling or infrastructural building blocks; from the identification of common themes, such as multi-task learning, simulated-to-real domain adaptation, modelling distributions from data, uncertainty and drift detection, all the way to finding common ground towards multi-messenger astronomy.

        Orateur: Luca Antiga
      • 15:00
        Discussion 4: Workflow management for machine learning 45m

        With Machine Learning techniques being applied at larger and larger scales in high-energy and astroparticle physics, workflow management is gaining attention as one area where careful design and implementation are crucial to the success of a solution. "Workflow management" is indeed a broad term itself, with different meanings. Presentations on specific approaches at IWAPP have explored various aspects, ranging from more hardware-oriented studies to mathematical techniques and project lifecycle management; at the same time, it has been shown that Machine Learning, while originally well grounded in offline data processing, is moving closer and closer to instruments, possibly providing (quasi) real-time decision and data acquisition steering capabilities with online reconstructions using trigger objects rather than raw data. At the other end of the application spectrum, Machine Learning is being used to provide classification criteria for libraries of datasets. The discussion is expected to develop about tools and best practices to ensure that the solutions are flexible and future-proof.

        Orateur: Cristiano Bozza
      • 15:00
        Discussion 5: Real time analysis and triggering 45m

        The amount of data to be processed by scientific experiments is ever-increasing, and the time-to-insight from recording to benefitting from the data is becoming a more relevant benchmark than for previous instruments.
        In this discussion we will try to understand if there are common use cases and what solutions are implemented by the experiments of the participants to meet this challenge, with a focus on the data analysis that can be done as close as possible to the instrument.

        Orateur: Caterina Doglioni (Lund University)
      • 15:45
        Coffee break 15m https://gather.town/app/olLhUruftJ617h4z/IWAPP (Gather)

        https://gather.town/app/olLhUruftJ617h4z/IWAPP

        Gather

      • 16:00
        Discussion 1: Data reduction and data formats 45m
        Orateur: Kai Polsterer (HITS gGmbH)
      • 16:00
        Discussion 2: use of alternative hardware 45m
        Orateur: Maurizio Pierini (CERN)
      • 16:00
        Discussion 3: Machine & deep learning techniques 45m
        Orateur: Luca Antiga
      • 16:00
        Discussion 4: Workflow management for machine learning 45m
        Orateur: Cristiano Bozza
      • 16:00
        Discussion 5: Real time analysis and triggering 45m
        Orateur: Caterina Doglioni (Lund University)
    • 10:00 12:30
      Common approaches: Summary & Outlook

      Discussions and talks on combined workflows

      Président de session: Elena Cuoco
      • 10:00
        AI and ML landscape and ideas for platforms and services by EGI 30m
        Orateur: Ville Tenhunen (EGI)
      • 10:30
        Reports from the discussion groups 50m
        • Topic 1: Data reduction and data formats 10m
          Orateur: Kai Polsterer (HITS gGmbH)
        • Topic 2: Use of alternative hardware 10m
          Orateur: Maurizio Pierini (CERN)
        • Topic 3: Machine and deep learning techniques 10m
          Orateur: Luca Antiga
        • Topic 4: Workflow management for machine learning 10m
          Orateur: Cristiano Bozza
        • Topic 5: Real time analysis and triggering 10m
          Orateur: Caterina Doglioni (Lund University)
      • 11:20
        Panel discussion 1h
      • 12:20
        Summary 10m
        Orateur: Elena Cuoco