Avishkar Saha

I'm a Research Engineer at MireloAI, where I design and train foundation models for video, music, and sound. I completed my PhD in Computer Vision at CVSSP, University of Surrey, supervised by Richard Bowden, Chris Russell and Oscar Mendez. I also interned at Amazon AWS' Causal Representation Learning Lab in Tübingen.

My work at MireloAI involves the complete machine learning pipeline — from large-scale data collection and curation through model architecture design to distributed training of multimodal foundation models. I'm particularly interested in building models that can understand and generate content across multiple modalities.

Google Scholar/Github/LinkedIn/CV

news

Sep 20, 2025	MireloSFX now powers audio generation for all video gen models on fal.ai
Aug 22, 2025	Our video-to-sound model - Mirelo SFX - is now available on fal.ai! Check out fal.ai's announcement and our blog post for more info.
Jan 15, 2025	Published tutorial on writing custom CUDA kernels for 2x faster LayerNorm
Feb 16, 2024	Completed my PhD defense at the University of Surrey
Nov 1, 2023	Started Research Engineer position at MireloAI
Jul 14, 2023	Our paper "Learning Adaptive Neighborhoods for Graph Neural Networks" got into ICCV 2023
Jun 1, 2023	Completed 6-month internship at Amazon AWS' Causal Representation Learning Lab in Tübingen
Jun 1, 2022	Our paper "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping got into CVPR 2022
May 23, 2022	Our paper "Translating Images into Maps" won the Outstanding Paper Award at ICRA 2022
Mar 15, 2021	Our paper "Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation" got into ICRA 2021

invited talks & lectures

Sep 12, 2022	Tutorial at 3DV 2022, Prague: BEV mapping and addressing its shortcomings [Slides]
Jul 15, 2022	Invited talk at Amazon, Tübingen: Sparse representations for scene understanding [Slides]
Jan 20, 2022	Invited talk at Wayve, London: Image-to-birds eye view mapping [Slides]

selected publications

Learning Adaptive Neighborhoods for Graph Neural Networks

Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden

In 2023 International Conference on Computer Vision (ICCV), 2023
"The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Translating images into maps

Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden

In 2022 International Conference on Robotics and Automation (ICRA), 2022
Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation

Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden

In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021