AISSAI - Heterogeneous Data and Large Representation Models in Science

Europe/Paris
Le Village, Auditorium (Toulouse, France)

Le Village, Auditorium

Toulouse, France

31 Allées Jules Guesde, 31000 TOULOUSE
Description

The third workshop of the AISSAI semester on "Artificial Intelligence for the two infinites" will be held in Toulouse, France from September 30th to October 3rd, 2024. This workshop will be devoted to Heterogeneous Data and Large Representation Models in Science. 


The workshop came to its end. Thanks for being with us and making this event lively, collaborative and friendly!

Thank you all for you participation!

More is to come: pictures of the event, recordings of the talks, and your input to the paper we intend to write. We will keep in touch with you! 


Scientific program

In recent years, we have witnessed remarkable transformations in the AI/ML landscape. Particularly in computer vision and natural language processing, there is a notable emergence of Large Representation Models (LRMs) trained on extensive datasets, often referred to as foundation models. These LRMs possess the capability to encode information at a high level of abstraction, enabling the training of models on multimodal data such as text, image, sound, video, and more. This improvement in how models represent objects augments significantly their ability to understand and make sense of the world.

In the realm of science, we expect a similar revolution, triggered by the integration of heterogeneous multimodal scientific data from various sensor systems or sources into LRMs. Scientific data are often heterogeneous and multimodal in nature, originating from various sensors in embedded systems (robotics, aerospace), from different detector subsystems or different instruments in fundamental physics, or from different signal sources in a scientific experiment in general. The models can combine representations from neural networks with symbolic representations integrating a priori knowledge of the scientific domain.

To be more specific: The expected revolution in science will likely start with the development and deployment of algorithms that solve specific problems using heterogeneous data as inputs, as for example in the control of a robot using data from sensors of different types (cameras, infrared proximity scanners, …); or the reconstruction of elementary particles in a detector that combines diverse technologies (silicon pixels and strips, calorimeters, …). At the same time, it should become possible to use machine learning to tackle problems where the analysis of one dataset requires multiple models to collaborate, with each model being an “expert” in one aspect of the data. An example from cosmology would be the analysis of the data of the LISA gravitational wave detector (link, link). LISA will observe the superposition of signals from a large, a priori unknown number of sources of different types (galactic binaries, mergers of massive black holes, and many others). Individual models might be trained separately for each type, and then learn to collaborate on a global analysis.
In all of these initial examples, the size of the models and the resources required for training should remain modest (nowhere near, say, ChatGPT) and be readily accessible to researchers in academia.

The aim of this workshop is to bring together scientists from different fields (just to list a few: computer sciences, cosmology, human sciences, mathematics, physics, robotics, statistics, etc.) and with different profiles (experimentalists, theorists, developers) to discuss these topics at the forefront of AI/ML research, fostering collaboration and innovation in this rapidly evolving field. 

The program is planed to be a mix of high-profile guest speaker presentations and contributed talks.

This workshop will delve into a range of topics, which include but are not limited to:

  • Constructing machine learning models capable of learning from diverse data types.
  • Managing multimodal data from varied sources, or heterogeneous data from scientific instruments that combine multiple detector technologies, for ML applications.
  • Investigating contrastive embeddings tailored for heterogeneous and multi-modal scientific data alongside shared embedding representations.
  • Exploring the integration of neuro-symbolic AI and multi-level representations.
  • Mathematical modeling of combined representation.
  • Exploring explainability and interpretability of Large Representation Models in the scientific context.
  • Embracing frugality and size management in Large Representation Models.
  • Possibly on a longer timescale, exploring numerical encodings for large language representations in scientific contexts. 

Practicalities

The workshop will take place in the Venue Le Village, downtown Toulouse. 

Participation will be limited to 80 on-site participants.

Registrations is free (but mandatory) and includes lunches and social events. Registrations are moderated and do not constitute acceptance of participation.  A final validation for acceptance of participation in the workshop will be made by the organizers and each person who will have registered will be notified.

Talks are expected to be given in person.


Important dates

These dates may be subject to change.

  Registration opening July 1st
  Abstract submission opening July 3rd
  Abstract submission dead-line August 15th -> September 9th
    Late abstract submission will be considered until September 15th, provided you contact us directly before September 9th
  Abstract acceptance dead-line August 31th
  Program release September 7th -> September 17th
  Registration closing September 15th
    Late registration request will be considered, please contact us directly as early as possible 
  Workshop start September 30th

Organisers and Partners

 
 

 

Participants
  • Abdelazyz RKHISS
  • Adnan Ghribi
  • Alexandre Boucaud
  • Alexandro Martone
  • Alvin Chua
  • Anthony Larroque
  • Antonin Vacheret
  • Areej Fatima
  • Aurélien Theret
  • Catherine Biscarat
  • Clément Brochet
  • Corentin Allaire
  • Corentin Lapeyre
  • Corentin SEZNEC
  • Daniel Murnane
  • David Cornu
  • David Rousseau
  • david roussel
  • Duong Hung PHAM
  • Fernando Gonzalez
  • François Lanusse
  • Grégory Sainton
  • Guillerme Bernoux
  • Hannah T. Rüdisser
  • Heberth Torres
  • Inar Timiryasov
  • Iris Dumeur
  • Jackson Barr
  • Jan Stark
  • Jonathan Gair
  • Judita Mamuzic
  • Kevin De Sousa
  • Laure Raynaud
  • Luciano Drozda
  • Léopold Maytié
  • Marc Spigai
  • Mathieu Dubois
  • Maxime Pigou
  • Michel BESSERVE
  • Minh-Tuan Pham
  • Natalia Korsakova
  • Nathan Sobetsky
  • Nicola Tamanini
  • Ollie Burke
  • Paul SAVES
  • Rafal Maselek
  • Riccardo Buscicchio
  • Richard Faucheron
  • Ritish Thakur
  • Rufin VanRullen
  • Sara Akodad
  • Scott DeGraw
  • Silvia Valero Valbuena
  • Sylvain Caillou
  • Sylvain Marsat
  • Thibault XAVIER
  • Thomas Oberlin
  • Thomas Schiex
  • Vangelis Kourlitis
  • Vasco Gennari
  • Victor Sanchez
  • Yoël Zérah
  • +25