Séminaires, soutenances

Etienne Russeil: Feature engineering and machine learning for 21st century astronomy

par Etienne Russeil

Europe/Paris
Amphi Recherche (LPCA)

Amphi Recherche

LPCA

Description

Astronomical transients are among the most energetic phenomena in the Universe. In order to unveil their secrets, increasingly better telescopes have been built to perform large scale sky surveys. The upcoming Vera-C.-Rubin Observatory represents the state of the art of a new generation of such surveys. It is expected to detect around 10 million candidate transients each night, producing a light curve for each of them. Given this unprecedented volume of data, the use of machine learning methods to automatically analyze them is unavoidable. However, the machine can only get as good as the quality of the data it is learning from. Hence, one of the most crucial steps of this process lies in the feature extraction of light curves. Ideally, it should optimally encode the object behavior while remaining interpretable, so it can be used by domain experts. The goal of this thesis is to enable meaningful feature extraction from high-dimension light curves. This manuscript presents a series of methods built to enhance state-of-the-art feature extraction procedures, allowing for both, physically motivated modelling and data-driven description. In this context, the Rainbow framework was developed to enable simultaneous multi-wavelength light curve fitting based on physical assumptions, resulting in a more informative parameter space. This method allows the expert to select the best suited parametric models for specific science cases. In order to guide this choice, I propose Multi-view Symbolic Regression (MvSR). It is a data-driven method which automatically constructs a parametric representation from a set of examples, allowing the expert to build tailored analytical functions. The applicability of MvSR extends beyond astronomy and was successfully applied to various sciences. Finally, both frameworks are combined within an adaptive classification pipeline, which takes into account particular characteristics of each light curve to choose the most appropriated parametric description. It demonstrates that the features extracted are highly informative, enabling the separation of similar transient classes, even for poorly sampled light curves.

Overall, all methods proposed contribute to an interdisciplinary vision of modern astronomy, where domain experts and data scientists collaborate to construct efficient tools that benefit the community.

 

Zoom link