AISSAI - Heterogeneous Data and Large Representation Models in Science

Name: AISSAI - Heterogeneous Data and Large Representation Models in Science
Start: 2024-09-30T12:30:00+02:00
End: 2024-10-03T14:25:00+02:00
Location: Toulouse, France

30 September 2024 to 3 October 2024

Toulouse, France

Europe/Paris timezone

Contact

aissai_heterogeneous-data_LOC@l2it.in2p3.fr

Semi-supervised multimodal representation learning through a global workspace

1 Oct 2024, 14:15

30m

Le Village, Auditorium (Toulouse, France)

Le Village, Auditorium

Toulouse, France

31 Allées Jules Guesde, 31000 TOULOUSE

Oral presentation

Léopold Maytié (Université Toulouse III - Paul Sabatier, Toulouse, France & Artificial and Natural Intelligence Toulouse Institute (ANITI))

Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or text-to-image generation). However, current approaches mainly rely on brute-force supervised training over large multimodal datasets. In contrast, humans (and other animals) can learn useful multimodal representations from only sparse experience with matched cross-modal data. Here we evaluate the capabilities of a neural network architecture inspired by the cognitive notion of a ``Global Workspace'': a shared representation for two (or more) input modalities. Each modality is processed by a specialized system (pretrained on unimodal data, and subsequently frozen). The corresponding latent representations are then encoded to and decoded from a single shared workspace. Importantly, this architecture is amenable to self-supervised training via cycle-consistency: encoding-decoding sequences should approximate the identity function. For various pairings of vision-language modalities and across two datasets of varying complexity, we show that such an architecture can be trained to align and translate between two modalities with very little need for matched data (from 4 to 7 times less than a fully supervised approach). The global workspace representation can be used advantageously for downstream classification and cross-modal retrieval tasks and for robust transfer learning. Ablation studies reveal that both the shared workspace and the self-supervised cycle-consistency training are critical to the system's performance.

Contribution length	Middle

Benjamin Devillers (Centre de Recherche Cerveau et Cognition (CerCo), CNRS) Léopold Maytié (Université Toulouse III - Paul Sabatier, Toulouse, France & Artificial and Natural Intelligence Toulouse Institute (ANITI)) Rufin VanRullen (Centre de Recherche Cerveau et Cognition (CerCo), Artificial and Natural Intelligence Toulouse Institute (ANITI))

AISSAI Workshop 01_10-2.pdf

AISSAI - Heterogeneous Data and Large Representation Models in Science

Contact

Semi-supervised multimodal representation learning through a global workspace

Le Village, Auditorium

Toulouse, France

Speaker

Description

Authors

Presentation materials

Choose timezone

AISSAI - Heterogeneous Data and Large Representation Models in Science

Contact

Speaker

Description

Authors

Presentation materials