Speaker
Mr
Alvaro Moran
(Hugging Face)
Description
In this talk, we'll explore cutting-edge techniques to optimize both training and inference in PyTorch, enabling faster, more efficient model execution. We'll dive into the power of PyTorch's torch.compile
to accelerate workflows by fusing operations and generating optimized code, reducing runtime overhead. Additionally, we'll cover the use of custom kernels with tools like Triton, Pallas and CUDA, allowing fine-grained control over GPU and TPU execution for performance-critical tasks. Beyond that, we'll have an overview on various methods like mixed precision, memory optimization strategies, and distributed training, all aimed at achieving optimal performance for large-scale machine learning models.
Contribution length | Short |
---|
Author
Mr
Alvaro Moran
(Hugging Face)