IWAPP - Innovative Workflows in Astro- & Particle Physics

Name: IWAPP - Innovative Workflows in Astro- & Particle Physics
Start: 2021-03-08T10:00:00+01:00
End: 2021-03-12T13:00:00+01:00
Location: Online

8 mars 2021, 10:00 → 12 mars 2021, 13:00 Europe/Paris

Online

https://fau.zoom.us/j/92555464859?pwd=RHlHUERjeVRRMWJHVnJoSk1BZ0VHUT09

Elena Cuoco

Description

Innovative workflows - the agenda

The use of machine and deep learning techniques has become increasingly common in scientific communities. The large amount of data to analyze and the need for real-time analysis are driving the search for cutting-edge techniques and new data analysis paradigms.

Especially the field of deep learning for event-based, image-based and signal-based applications of two and higher dimensions has been intensively followed by expert groups within the ESFRIs and will be deployed to the EOSC via ESCAPE.

The objective of this workshop is to bring together the scientists' communities of Astrophysics, Astroparticle Physics and Particle Physics who are leading the development of Innovative Workflows within their domain and explore common approaches for innovative workflows.

Interactive format and contributions

The workshop aims to increase exchange and discussions between the participants. In addition to the presentations in the main sessions, all registrants are therefore invited to share their work in open interactive sessions.

Technical support and uploads

For any questions on technical support, please contact escape.iwapp@gmail.com.

If you would like to upload material to your contribution and cannot access your contribution, you can also drop your files (named name_contribution.format, eg. Jones_presentation.pdf) in this folder.

Contact

elena.cuoco@ego-gw.it

cbozza@unisa.it

jutta.schnabel@fau.de

escape.iwapp@gmail.com

Participants

84 Voir la liste complète

lun. 8 mars
- Introduction: Overview presentations
  
  Introduction and overview presentations on innovative workflows
  
  Président de session: Elena Cuoco
  - 1
    
    Introduction & Overview
    
    Orateur: Elena Cuoco
    
    IWAPP_intro.pdf
    
    Recording @Youtube
  - 2
    
    OSSR & ESCAPE
    
    Orateur: Kay Graf (ECAP - University of Erlangen)
    
    ESCAPE_OSSR_IWAPP_202003.pdf
    
    Recording @Youtube
  - 3
    
    Technical introduction
    
    Orateur: Jutta Schnabel (FAU Erlangen (ECAP))
    
    IWAPP tech introduction.pdf
  - 4
    
    AI at scale
    
    Deep Learning has been the most significant breakthrough in the past 10 years in the field of pattern recognition and machine learning. It has achieved significant advancements in terms of the effectiveness of prediction models on many research topics and application fields, ranging from computer vision, natural language processing, embodied AI and to more traditional fields of pattern recognition. This paradigm shift has radically changed the research methodology towards a data-oriented approach, in which learning involves all steps of the prediction pipeline from feature extraction to classification. While research efforts have concentrated on the design of effective feature extraction and prediction architectures, computation has moved from CPU-only approaches to the dominant use of GPUs and massively parallel devices, empowered by large-scale and highly dimensional datasets.
    
    The goal of this talk is to present recent advancements in AI and techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications.
    
    Giuseppe Fiameni is a Solution Architect at NVIDIA where he oversees the NVIDIA AI Technology Centre in Italy, a collaboration among NVIDIA, CINI and CINECA to accelerate academic research in the field of Artificial Intelligence through collaboration projects. He has been working as HPC specialist at CINECA, the largest HPC facility in Italy, for more than 14 years providing support for large-scale data analytics workloads.
    
    Orateur: Giuseppe Fiameni (NVIDIA)
  - 11:30
    
    Coffee break https://gather.town/app/olLhUruftJ617h4z/IWAPP (Gather)
    
    https://gather.town/app/olLhUruftJ617h4z/IWAPP
    
    Gather
  - 5
    
    Deep Learning for real-time data processing at the LHC
    
    Orateur: Maurizio Pierini (CERN)
    
    IWAPP_Feb2021.pdf
    
    Recording @Youtube
- Hands-On: Bring your own container - open interactive session
  
  Président de session: Dr Kay Graf (ECAP - University of Erlangen)
  - 6
    
    Astro-COLIBRI: a new platform for time-domain astronomy colab space (Gather)
    
    colab space
    
    Gather
    
    https://gather.town/app/olLhUruftJ617h4z/IWAPP
    
    Astro-COLIBRI: The coincidence library for real-time inquiry for multi-messenger astrophysics
    
    Flares of known astronomical sources and new transient phenomena occur on different timescales, from sub-seconds to several days or weeks. The discovery potential of both serendipitous observations and multi-messenger and multi-wavelength follow-up observations could be maximized with a tool which allows for quickly acquiring an overview over both persistent sources as well as transient events in the relevant phase space. We here present COincidence LIBrary for Real-time Inquiry (Astro-COLIBRI), a novel and comprehensive tool for this task.
    
    Astro-COLIBRI's architecture comprises a RESTful API, a real-time database, a cloud-based alert system and a website as well as apps for iOS and Android as clients for users. The structure of Astro-COLIBRI is optimized for performance and reliability and exploits concepts such as multi-index database queries, a global content delivery network (CDN), and direct data streams from the database to the clients to allow for a seemless user experience. Astro-COLIBRI evaluates incoming VOEvent messages of astronomical observations in real time, filters them by user specified criteria and puts them into their MWL and MM context. The clients provide a graphical representation with an easy to grasp summary of the relevant data to allow for the fast identification of interesting phenomena and provides an assessement of observing conditions at a large selection of observatories around the world.
    
    The platform is currently in its beta phase. We'll present the current features and outline future improvements.
    
    Orateur: Fabian Schussler (CEA/Irfu)
    
    Astro-COLIBRI_IWAPP_2021-03-08_Schussler.pdf
    
    Astro-COLIBRI platform
    
    Schussler_Astro-COLIBRI.pdf
  - 7
    
    Virtual Observatory
    
    Orateur: Hendrik Heinl (CDS/ObAS)
    
    ESCAPE VO school February 2021
    
    Virtual Observatory Text Treasures by GAVO
    
    VO Tutorial Collection of EuroVO
mar. 9 mars
- Workflows: Specific approaches
  
  Specific workflow solutions
  
  Président de session: Cristiano Bozza
  - 8
    
    MEGAVIS - Real-time spectra analysis and visualization with autoencoders
    
    The data explosion in astronomy requires the development of new techniques both from the infrastructure and from the analysis side. In particular, the increase of the data complexity demands a parallel effort to deliver efficient and standardized solutions for accessing and managing data, tools and software. This is the main purpose of ESCAPE. In this talk I will give an overview of the work fulfilled within the project, presenting MEGAVIS, a prototype based on machine learning, which aims to start building a new paradigm for data access and search, not based on explicit criteria but implicitly, looking at similarities. The prototype is based on dimensionality reduction models, and in particular on an autoencoder. The main features and capabilities of the software will be illustrated, and the possibilities of future developments.
    
    Orateur: Dr Antonio D'Isanto (HITS gGmbH)
    
    iwapp_talk_disanto.pdf
    
    iwapp_talk_disanto.pptx
    
    Recording @Youtube
  - 9
    
    APEIRON
    
    APEIRON is an INFN Scientific Committee 5 funded project aimed at designing and developing a framework to study, prototype and deploy AI-based real-time processing apparatuses boosting particle identification capabilities in trigger systems or performing efficient online data reduction for triggerless ones.
    It involves the definition of the general architecture of a heterogeneous distributed execution platform along with its software stack and a set of relevant use cases to validate it.
    NA62 at CERN is a fixed target experiment on ultra-rare kaon decays and represents the main use case for APEIRON.
    High particle rates and prompt online data selections are the pillars of its experimental strategy given that the physics signal of interest is ten orders of magnitude less frequent than the background.
    To upgrade the trigger and data acquisition system of NA62 we are building shallow neural networks that extract high level features from a Ring Imaging Cherenkov detector (RICH) and we are testing them for real-time inference on FPGA using HLS technique.
    Fully connected and convolutional architectures have been explored getting different performances in terms of classification accuracy, computational latency and digital resources utilization on the target FPGA device.
    In this short talk we briefly introduce the models, present the status of the project and sketch the perspectives of our work.
    
    Orateurs: Luca Pontisso, Matteo Turisini (INFN)
    
    210305_iwapp_turisini.pdf
    
    210309_IWAPP_Pontisso.pdf
    
    Recording @Youtube
  - 10
    
    Real-time analysis in high energy physics and beyond
    
    The Large Hadron Collider collides protons up to 30 million times a second, and provides its experiments with an enormous amount of data. The trigger systems of each experiment quickly analyse and decide whether to retain each of those collision events from the LHC for further analysis, on a timescale of the order of milliseconds. In this seminar, I will present/discuss an overview of the tools and real-time analysis techniques employed within these trigger systems, focusing on the ATLAS experiment but also outlining elements of the strategies of the CMS and LHCb experiments. I can also present/discuss connections of those techniques physics cases that use novel techniques to make the most of LHC data with a sensitivity that would not be achievable with standard techniques. I would be happy to discuss connections beyond high energy physics.
    
    Orateur: Caterina Doglioni (Lund University)
    
    20210309_Doglioni_IWAPP.pdf
    
    Recording @Youtube
  - 11:00
    
    Coffee break Gather
    
    Gather
    
    https://gather.town/app/olLhUruftJ617h4z/IWAPP
  - 11
    
    Some benefits of using normalizing flows for PDF modelling
    
    Orateur: Thorsten Glüßenkamp
  - 12
    
    A machine learning workflow for reproducible data science
    
    The advent of new machine learning approaches to astrophysics and particle physics comes with the challenge
    of identifying a set of tools that can support machine learning-driven workflows for data analysis in this domain.
    The list of available tools is growing by the day, and it can be challenging to identify a good starter set of tools.
    In this talk we will explore a few building blocks of a reproducible data pipeline for data versioning, model development,
    experiment tracking and model serving, such as Hangar, PyTorch Lightning, MLFlow and RedisAI.
    We will provide an introductory overview of the each of these tools and the respective advantages. Finally, we
    will present how we are leveraging these in the context of the ESCAPE project.
    
    Orateur: Luca Antiga
    
    Recording @Youtube
- Hands-On: Plenary
  
  Président de session: Kay Graf (ECAP - University of Erlangen)
  - 13
    
    Hands-on data-versioning for scientific pipelines with Hangar
    
    Orateurs: Alberto Iess (INFN Roma Tor Vergata), Filippo Quarenghi (Orobix)
    
    Hands-On resources @GoogleDrive
    
    Hangar Web Tutorial
    
    Recording @Youtube
  - 14
    
    Using normalizing flows
    
    see gather.town for materials
    
    Orateur: Thorsten Glüßenkamp
mer. 10 mars
- Workflows: Solutions in Research Infrastructures
  
  Specific workflow solutions
  
  Président de session: Thomas Eberl (ECAP - Uni Erlangen)
  - 15
    
    Machine Learning Algorithms in LIGO-Virgo
    
    Machine learning techniques are increasingly popular in gravitational-wave physics. I will introduce some of the current applications in the field, from detector noise characterisation to data analysis techniques and highlight areas of active development with strong innovative potential for current-generation detectors.
    
    Orateur: Maxime Fays
    
    Recording @Youtube
  - 16
    
    Anomaly Detection in Gravitational Waves data using Convolutional AutoEncoders
    
    Orateur: Filip Morawski (Nicolaus Copernicus Astronomical Center of the Polish Academy of Sciences)
    
    iwapp_anomalies.pdf
    
    Recording @Youtube
  - 17
    
    Science data challenges for the SKA
    
    I will talk about some of the work we have been doing at the Square Kilometer Array (SKA) creating data challenges that help prepare users for when the SKA comes online. The focus of these is to give users data similar to that expected from the telescope, and ask them to implement workflows that produce science outputs. The first data challenge involved identifying and classifying sources in images, and we developed an example containerised solution promoting best practices in machine learning, software development and reproducibility. I'll highlight some parts of this, and discuss some of the machine learning methods that were most useful.
    
    Orateur: Alex Clarke
    
    Recording @Youtube
    
    SKA Data Challenges - AlexClarke.pdf
  - 10:50
    
    Coffee break https://gather.town/app/olLhUruftJ617h4z/IWAPP (Gather)
    
    https://gather.town/app/olLhUruftJ617h4z/IWAPP
    
    Gather
  - 18
    
    Machine Learning Workflows in KM3NeT
    
    KM3NeT consist of two water-Cherenkov neutrino detectors currently under construction in the Mediterranean Sea:
    the low energy site ORCA in France, as well as the high energy site ARCA in Italy. ORCA's goal is the determination of the neutrino mass hierarchy by measuring the energy- and zenith-angle-resolved oscillation probabilities of atmospheric neutrinos traversing the Earth. ARCA will use its wide coverage of the observable sky to look for high-energy astrophysical neutrino sources. Machine Learning algorithms play an important role in analysing the signatures induced by particles traversing the detectors.
    
    This talk will give a detailed explanation of the types of data in KM3NeT, the challenges we faced when analysing it, and the various machine learning based solutions and workflows that were developed over the years. The topics range from maximum-likelihood-based reconstruction algorithms accompanied by shallow machine learning techniques like Random Forests, up to the use of deep artificial neural networks and graph-based networks.
    
    Orateur: Stefan Reck (ECAP - University of Erlangen)
    
    ML_workflows_km3net.pdf
    
    Recording @Youtube
  - 19
    
    Incorporating deep learning into the analysis of the Cherenkov Telescope Array
    
    Orateur: Tjark Miener
    
    Recording @Youtube
    
    TjarkMiener_presentation.pdf
  - 20
    
    GNA: data flow approach for the neutrino oscillation experiments
    
    GNA is a framework dedicated for building and fitting large scale models related to the neutrino oscillation physics. The core is written in C++ and operated from within Python. Following the data flow paradigm the model is built as a directed acyclic graph. Each node of the graph represents a function that operates on a vectorized data and depends on a few parameters. Any part of the graph is evaluated lazily. A library of transformations, implementing various functions, including vectorized integration and interpolation, is precompiled. While the approach is ideologically similar to the one, used within ML field (TensorFlow/zfit, PyTouch), one of the key differences is a requirement to build non-uniform models containing a large number of elements. The framework enables the user to build incomplete and independent parts of the model and combine them into a single graph, by describing its structure as a mathematical expression. The description of the framework as well as practical examples from the JUNO experiment will be presented.
    
    Orateurs: Maxim Gonchar (JINR), Maxim Gonchar (JINR)
    
    gonchar-2021-03-iwapp-gna-dataflow-v1-1-1.pdf
    
    Recording @Youtube
- Workflows: Open interactive session
  
  Specific workflow solutions
  - 21
    
    ALFA / FairRoot Framework
    
    Orateur: Dmytro Kresan (GSI)
    
    Kresan_FairRoot_ALFA.pdf
  - 22
    
    The OSSR
    
    This contribution demonstrates how you can share your own work during the open poster session, by sharing a webpage or poster pointing to your work. Please provide links or uploads through the form "Open session contribution"
    
    Orateur: Jutta Schnabel (FAU Erlangen (ECAP))
    
    OSSR portal
    
    OSSR poster
- Evening talk: Finding common ground
  
  Invited talk
  - 23
    
    Machine Learning in Healthcare
    
    Orateur: Mauricio Santillana (Boston Children's Hospital / Harvard Medical School)
    
    Recording @Youtube
jeu. 11 mars
- Forum
  
  Président de session: Thomas Vuillaume (LAPP, CNRS)
  
  Recording @Youtube
  - 24
    
    KM3NeT Tier-2 Computing at Tbilisi State University
    
    Orateur: Gogita Papalashvili
    
    FlashTalk_IWAPP_KM3NeT_Gogita.pdf
  - 25
    
    New approaches for multi-messenger real time analysis
    
    Orateur: Barbara Patricelli (EGO)
    
    Barbara_IWAPP_2021.pdf
  - 26
    
    Classifying galaxies, quasars and stars in optical surveys with machine learning
    
    We implemented a random forest and a dimension reduction algorithm called UMAP to classify and analyse 111 million sources from the Sloan Digital Sky Survey (SDSS) based on their photometry. These sources did not have the spectroscopic observations traditionally required for classification, but our method still enabled us to find 2 million new active supermassive black holes.
    
    Orateur: Alex Clarke
    
    AlexClarke-RAS-Poster-2020.pdf
  - 27
    
    Distributed data processing using FairMQ framework.
    
    Orateur: Dmytro Kresan (GSI)
    
    Kresan_Presentation_IWAPP.pdf
- Common approaches: Presentations
  
  Discussions and talks on combined workflows
  
  Président de session: Dr Thomas Vuillaume (LAPP, CNRS)
  
  Recording @Youtube
  - 28
    
    The test science projects as starting point for common efforts
    
    Orateurs: Caterina Doglioni (Lund University), Elena Cuoco
    
    IWAPP_TSP1_DM_Doglioni.pdf
    
    IWAPP_TSP2-ExtremeUniverse_Cuoco.pdf
  - 10:20
    
    Coffee break Gather
    
    Gather
  - 29
    
    Introduction to discussion sessions
    
    IWAPP discussion session intro.pdf
  - 30
    Discussion topics overview
    
    a) Topic 1: Data reduction and data formats
    
    Orateur: Kai Polsterer (HITS gGmbH)
    
    Main contribution
    
    b) Topic 2: Use of alternative hardware
    
    Orateur: Maurizio Pierini (CERN)
    
    Main contribution
    
    c) Topic 3: Machine and deep learning techniques
    
    Orateur: Luca Antiga
    
    Main contribution
    
    d) Topic 4: Workflow management for machine learning
    
    Orateur: Cristiano Bozza
    
    Main contribution
    
    e) Topic 5: Real time analysis and triggering
    
    Orateur: Caterina Doglioni (Lund University)
    
    Main contribution
- Common approaches: Discussion groups
  
  Discussions and talks on combined workflows
  
  Président de session: Jutta Schnabel (FAU Erlangen (ECAP))
  
  Recording @Youtube
  - 31
    
    Discussion 1: Data reduction and data formats
    
    With the exponential increase in available data, new data analysis paradigms are required. The tradition single source science approaches used for nowadays data archives do not scale with recent data challenges. Machine learning is one of the key tools to provide an assistance in dealing with this data avalanche. This brings forth new topics to be dealt with:
    - pre-processing and compressed representations of data
    - from data formats to data interface
    - metadata, provenance, versioning, snapshots and others
    
    Orateur: Kai Polsterer (HITS gGmbH)
    
    Discussion document
  - 32
    
    Discussion 2: use of alternative hardware
    
    With new technologies emerging (e.g., deep learning), scientific computing environment are becoming more and more heterogeneous. Several parallel computing devices (GPGAs, GPUs, etc) can be exploited to accelerate traditional algorithms and include deep learning components in the processing workflows. A wide set of dedicated computing devices (TPUs, IPUs, ...) are targeting specific use cases, e.g., convolutional neural networks, recurrent networks, graph networks. Neuromorphic computing opens the possibility to use spiking neural networks for signal processing. On a longer term, quantum computing might offer interesting alternatives to solve large combinatoric problems. With private companies dictating the direction followed by innovation, big scientific collaborations might have to adapt their data processing to follow this trend, which could offer specific advantages.
    
    Orateur: Maurizio Pierini (CERN)
    
    Discussion document
    
    IWAPP_Discussion.pdf
  - 33
    
    Discussion 3: Machine & deep learning techniques
    
    The adoption of deep learning and machine learning techniques in the scientific community has become increasingly widespread. Sharing knowledge and code on approaches and architectures is already become practice, but this could be improved by creating common resources that require limited effort but may have a high impact on the field, by both enabling new science and ensuring prior work can be reproduced. This can span a range of possibilities, starting from well-maintained collections of recent work and related code, to the collaboration on specific modelling or infrastructural building blocks; from the identification of common themes, such as multi-task learning, simulated-to-real domain adaptation, modelling distributions from data, uncertainty and drift detection, all the way to finding common ground towards multi-messenger astronomy.
    
    Orateur: Luca Antiga
    
    Discussion document
  - 34
    
    Discussion 4: Workflow management for machine learning
    
    With Machine Learning techniques being applied at larger and larger scales in high-energy and astroparticle physics, workflow management is gaining attention as one area where careful design and implementation are crucial to the success of a solution. "Workflow management" is indeed a broad term itself, with different meanings. Presentations on specific approaches at IWAPP have explored various aspects, ranging from more hardware-oriented studies to mathematical techniques and project lifecycle management; at the same time, it has been shown that Machine Learning, while originally well grounded in offline data processing, is moving closer and closer to instruments, possibly providing (quasi) real-time decision and data acquisition steering capabilities with online reconstructions using trigger objects rather than raw data. At the other end of the application spectrum, Machine Learning is being used to provide classification criteria for libraries of datasets. The discussion is expected to develop about tools and best practices to ensure that the solutions are flexible and future-proof.
    
    Orateur: Cristiano Bozza
    
    Discussion document
  - 35
    
    Discussion 5: Real time analysis and triggering
    
    The amount of data to be processed by scientific experiments is ever-increasing, and the time-to-insight from recording to benefitting from the data is becoming a more relevant benchmark than for previous instruments.
    In this discussion we will try to understand if there are common use cases and what solutions are implemented by the experiments of the participants to meet this challenge, with a focus on the data analysis that can be done as close as possible to the instrument.
    
    Orateur: Caterina Doglioni (Lund University)
    
    Discussion document
  - 15:45
    
    Coffee break https://gather.town/app/olLhUruftJ617h4z/IWAPP (Gather)
    
    https://gather.town/app/olLhUruftJ617h4z/IWAPP
    
    Gather
  - 36
    
    Discussion 1: Data reduction and data formats
    
    Orateur: Kai Polsterer (HITS gGmbH)
    
    Discussion document
    
    Main contribution
  - 37
    
    Discussion 2: use of alternative hardware
    
    Orateur: Maurizio Pierini (CERN)
    
    Discussion document
    
    Main contribution
  - 38
    
    Discussion 3: Machine & deep learning techniques
    
    Orateur: Luca Antiga
    
    Discussion document
    
    Main contribution
  - 39
    
    Discussion 4: Workflow management for machine learning
    
    Orateur: Cristiano Bozza
    
    Discussion document
    
    Main contribution
  - 40
    
    Discussion 5: Real time analysis and triggering
    
    Orateur: Caterina Doglioni (Lund University)
    
    Discussion document
    
    Main contribution
ven. 12 mars
- Common approaches: Summary & Outlook
  
  Discussions and talks on combined workflows
  
  Président de session: Elena Cuoco
  
  Recording @Youtube
  - 41
    
    AI and ML landscape and ideas for platforms and services by EGI
    
    Orateur: Ville Tenhunen (EGI)
    
    EGI_AI_and_ML_2021-03-12.pdf
  - 42
    Reports from the discussion groups
    
    a) Topic 1: Data reduction and data formats
    
    Orateur: Kai Polsterer (HITS gGmbH)
    
    Main contribution
    
    b) Topic 2: Use of alternative hardware
    
    Orateur: Maurizio Pierini (CERN)
    
    Main contribution
    
    c) Topic 3: Machine and deep learning techniques
    
    Orateur: Luca Antiga
    
    Main contribution
    
    d) Topic 4: Workflow management for machine learning
    
    Orateur: Cristiano Bozza
    
    Main contribution
    
    e) Topic 5: Real time analysis and triggering
    
    Orateur: Caterina Doglioni (Lund University)
    
    Main contribution
  - 43
    
    Panel discussion
    
    Whiteboard for idea collection
  - 44
    
    Summary
    
    Orateur: Elena Cuoco

Choisissez le fuseau horaire

IWAPP - Innovative Workflows in Astro- & Particle Physics

Online

Innovative workflows - the agenda

Interactive format and contributions

Technical support and uploads

https://gather.town/app/olLhUruftJ617h4z/IWAPP

Gather

colab space

Gather

Gather

https://gather.town/app/olLhUruftJ617h4z/IWAPP

Gather

Gather

https://gather.town/app/olLhUruftJ617h4z/IWAPP

Gather