30 September 2024 to 3 October 2024
Toulouse, France
Europe/Paris timezone

The third workshop of the AISSAI semester on "Artificial Intelligence for the two infinites" will be held in Toulouse, France from September 30th to October 3rd, 2024. This workshop will be devoted to Heterogeneous Data and Large Representation Models in Science. 


Scientific programme

In recent years, we have witnessed remarkable transformations in the AI/ML landscape. Particularly in computer vision and natural language processing, there is a notable emergence of Large Representation Models (LRMs) trained on extensive datasets, often referred to as foundation models. These LRMs possess the capability to encode information at a high level of abstraction, enabling the training of models on multimodal data such as text, image, sound, video, and more. This improvement in how models represent objects augments significantly their ability to understand and make sense of the world.

In the realm of science, we expect a similar revolution, triggered by the integration of heterogeneous multimodal scientific data from various sensor systems or sources into LRMs. Scientific data are often heterogeneous and multimodal in nature, originating from various sensors in embedded systems (robotics, aerospace), from different detector subsystems or different instruments in fundamental physics, or from different signal sources in a scientific experiment in general. The models can combine representations from neural networks with symbolic representations integrating a priori knowledge of the scientific domain.

To be more specific: The expected revolution in science will likely start with the development and deployment of algorithms that solve specific problems using heterogeneous data as inputs, as for example in the control of a robot using data from sensors of different types (cameras, infrared proximity scanners, …); or the reconstruction of elementary particles in a detector that combines diverse technologies (silicon pixels and strips, calorimeters, …). At the same time, it should become possible to use machine learning to tackle problems where the analysis of one dataset requires multiple models to collaborate, with each model being an “expert” in one aspect of the data. An example from cosmology would be the analysis of the data of the LISA gravitational wave detector (link, link). LISA will observe the superposition of signals from a large, a priori unknown number of sources of different types (galactic binaries, mergers of massive black holes, and many others). Individual models might be trained separately for each type, and then learn to collaborate on a global analysis.
In all of these initial examples, the size of the models and the resources required for training should remain modest (nowhere near, say, ChatGPT) and be readily accessible to researchers in academia.

The aim of this workshop is to bring together scientists from different fields (just to list a few: computer sciences, cosmology, human sciences, mathematics, physics, robotics, statistics, etc.) and with different profiles (experimentalists, theorists, developers) to discuss these topics at the forefront of AI/ML research, fostering collaboration and innovation in this rapidly evolving field. 

The program is planed to be a mix of high-profile guest speaker presentations and contributed talks.

This workshop will delve into a range of topics, which include but are not limited to:

  • Constructing machine learning models capable of learning from diverse data types.
  • Managing multimodal data from varied sources, or heterogeneous data from scientific instruments that combine multiple detector technologies, for ML applications.
  • Investigating contrastive embeddings tailored for heterogeneous and multi-modal scientific data alongside shared embedding representations.
  • Exploring the integration of neuro-symbolic AI and multi-level representations.
  • Mathematical modeling of combined representation.
  • Exploring explainability and interpretability of Large Representation Models in the scientific context.
  • Embracing frugality and size management in Large Representation Models.
  • Possibly on a longer timescale, exploring numerical encodings for large language representations in scientific contexts. 

Practicalities

The workshop will take place in the Venue Le Village, downtown Toulouse. 

Participation will be limited to 80 on-site participants.

Registrations is free (but mandatory). Registrations are moderated and do not constitute acceptance of participation.  A final validation for acceptance of participation in the workshop will be made by the organizers and each person who will have registered will be notified.

Talks are expected to be given in person.


Important dates

These dates may be subject to change.

  Registration opening July 1st
  Abstract submission opening July 3rd
  Abstract submission dead-line August 15th
  Abstract acceptance dead-line August 31th
  Program release September 7th
  Registration closing September 15th
  Workshop start September 30th

Organisers and Partners

 
 

 

Starts
Ends
Europe/Paris
Toulouse, France
Le Village, Auditorium
31 Allées Jules Guesde, 31000 TOULOUSE
The call for abstracts is open
You can submit an abstract for reviewing.
Application
Application for this event is currently open.