Avishkar Saha
Research Engineer @ MireloAI, developing foundation models for video, music & sound.

I'm a Research Engineer at MireloAI, where I design and train foundation models for video, music, and sound. I completed my PhD in Computer Vision at CVSSP, University of Surrey, supervised by Richard Bowden, Chris Russell and Oscar Mendez. I also interned at Amazon AWS' Causal Representation Learning Lab in Tübingen.
My work at MireloAI involves the complete machine learning pipeline — from large-scale data collection and curation through model architecture design to distributed training of multimodal foundation models. I'm particularly interested in building models that can understand and generate content across multiple modalities.
My interests include, but are not limited to: Geometric Deep Learning (i.e. Graphs), Data Efficient Machine Learning, and CUDA kernel development.
news
Aug 22, 2025 | Our video-to-sound model - Mirelo SFX - is now available on fal.ai! Check out fal.ai's announcement and our blog post for more info. |
---|---|
Jan 15, 2025 | Published tutorial on writing custom CUDA kernels for 2x faster LayerNorm |
Feb 16, 2024 | Completed my PhD defense at the University of Surrey |
Nov 1, 2023 | Started Research Engineer position at MireloAI |
Jul 14, 2023 | Our paper "Learning Adaptive Neighborhoods for Graph Neural Networks" got into ICCV 2023 |
Jun 1, 2023 | Completed 6-month internship at Amazon AWS' Causal Representation Learning Lab in Tübingen |
Sep 12, 2022 | Tutorial at 3DV 2022, Prague: BEV mapping and addressing its shortcomings [Slides] |
Jul 15, 2022 | Invited talk at Amazon, Tübingen: Sparse representations for scene understanding [Slides] |
Jun 1, 2022 | Our paper "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping got into CVPR 2022 |
May 23, 2022 | Our paper "Translating Images into Maps" won the Outstanding Paper Award at ICRA 2022 |
Jan 20, 2022 | Invited talk at Wayve, London: Image-to-birds eye view mapping [Slides] |
Mar 15, 2021 | Our paper "Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation" got into ICRA 2021 |
selected publications
-
In 2023 International Conference on Computer Vision (ICCV), 2023
-
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
-
In 2022 International Conference on Robotics and Automation (ICRA), 2022
-
In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021