AISSAI - Heterogeneous Data and Large Representation Models in Science

Name: AISSAI - Heterogeneous Data and Large Representation Models in Science
Start: 2024-09-30T12:30:00+02:00
End: 2024-10-03T14:25:00+02:00
Location: Toulouse, France

30 September 2024 to 3 October 2024

Toulouse, France

Europe/Paris timezone

Contact

aissai_heterogeneous-data_LOC@l2it.in2p3.fr

The third workshop of the AISSAI semester on "Artificial Intelligence for the two infinites" will be held in Toulouse, France from September 30^th to October 3^rd, 2024. This workshop will be devoted to Heterogeneous Data and Large Representation Models in Science.

The workshop came to its end. Thanks for being with us and making this event lively, collaborative and friendly!

Thank you all for you participation!

More is to come: pictures of the event, recordings of the talks, and your input to the paper we intend to write. We will keep in touch with you!

Scientific program

In recent years, we have witnessed remarkable transformations in the AI/ML landscape. Particularly in computer vision and natural language processing, there is a notable emergence of Large Representation Models (LRMs) trained on extensive datasets, often referred to as foundation models. These LRMs possess the capability to encode information at a high level of abstraction, enabling the training of models on multimodal data such as text, image, sound, video, and more. This improvement in how models represent objects augments significantly their ability to understand and make sense of the world.

In the realm of science, we expect a similar revolution, triggered by the integration of heterogeneous multimodal scientific data from various sensor systems or sources into LRMs. Scientific data are often heterogeneous and multimodal in nature, originating from various sensors in embedded systems (robotics, aerospace), from different detector subsystems or different instruments in fundamental physics, or from different signal sources in a scientific experiment in general. The models can combine representations from neural networks with symbolic representations integrating a priori knowledge of the scientific domain.

To be more specific: The expected revolution in science will likely start with the development and deployment of algorithms that solve specific problems using heterogeneous data as inputs, as for example in the control of a robot using data from sensors of different types (cameras, infrared proximity scanners, …); or the reconstruction of elementary particles in a detector that combines diverse technologies (silicon pixels and strips, calorimeters, …). At the same time, it should become possible to use machine learning to tackle problems where the analysis of one dataset requires multiple models to collaborate, with each model being an “expert” in one aspect of the data. An example from cosmology would be the analysis of the data of the LISA gravitational wave detector (link, link). LISA will observe the superposition of signals from a large, a priori unknown number of sources of different types (galactic binaries, mergers of massive black holes, and many others). Individual models might be trained separately for each type, and then learn to collaborate on a global analysis.
In all of these initial examples, the size of the models and the resources required for training should remain modest (nowhere near, say, ChatGPT) and be readily accessible to researchers in academia.

The aim of this workshop is to bring together scientists from different fields (just to list a few: computer sciences, cosmology, human sciences, mathematics, physics, robotics, statistics, etc.) and with different profiles (experimentalists, theorists, developers) to discuss these topics at the forefront of AI/ML research, fostering collaboration and innovation in this rapidly evolving field.

The program is planed to be a mix of high-profile guest speaker presentations and contributed talks.

This workshop will delve into a range of topics, which include but are not limited to:

Constructing machine learning models capable of learning from diverse data types.
Managing multimodal data from varied sources, or heterogeneous data from scientific instruments that combine multiple detector technologies, for ML applications.
Investigating contrastive embeddings tailored for heterogeneous and multi-modal scientific data alongside shared embedding representations.
Exploring the integration of neuro-symbolic AI and multi-level representations.
Mathematical modeling of combined representation.
Exploring explainability and interpretability of Large Representation Models in the scientific context.
Embracing frugality and size management in Large Representation Models.
Possibly on a longer timescale, exploring numerical encodings for large language representations in scientific contexts.

Practicalities

The workshop will take place in the Venue Le Village, downtown Toulouse.

Participation will be limited to 80 on-site participants.

Registrations is free (but mandatory) and includes lunches and social events. Registrations are moderated and do not constitute acceptance of participation. A final validation for acceptance of participation in the workshop will be made by the organizers and each person who will have registered will be notified.

Talks are expected to be given in person.

Important dates

These dates may be subject to change.

	Registration opening	July 1^st
	Abstract submission opening	July 3^rd
	Abstract submission dead-line	~~August 15^th~~ -> September 9^th
		Late abstract submission will be considered until September 15^th, provided you contact us directly before September 9^th
	Abstract acceptance dead-line	August 31^th
	Program release	~~September 7^th~~-> September 17^th
	Registration closing	September 15^th
		Late registration request will be considered, please contact us directly as early as possible
	Workshop start	September 30^th

Organisers and Partners