Avishkar Saha

Research Engineer @ MireloAI, developing foundation models for video, music & sound.

prof_pic.jpg

I'm a Research Engineer at MireloAI, where I design and train foundation models for video, music, and sound. I completed my PhD in Computer Vision at CVSSP, University of Surrey, supervised by Richard Bowden, Chris Russell and Oscar Mendez. I also interned at Amazon AWS' Causal Representation Learning Lab in Tübingen.

My work at MireloAI involves the complete machine learning pipeline — from large-scale data collection and curation through model architecture design to distributed training of multimodal foundation models. I'm particularly interested in building models that can understand and generate content across multiple modalities.

Google Scholar/Github/LinkedIn/CV

news

Sep 20, 2025 MireloSFX now powers audio generation for all video gen models on fal.ai
Aug 22, 2025 Our video-to-sound model - Mirelo SFX - is now available on fal.ai! Check out fal.ai's announcement and our blog post for more info.
Jan 15, 2025 Published tutorial on writing custom CUDA kernels for 2x faster LayerNorm
Feb 16, 2024 Completed my PhD defense at the University of Surrey
Nov 1, 2023 Started Research Engineer position at MireloAI
Jul 14, 2023 Our paper "Learning Adaptive Neighborhoods for Graph Neural Networks" got into ICCV 2023
Jun 1, 2023 Completed 6-month internship at Amazon AWS' Causal Representation Learning Lab in Tübingen
Jun 1, 2022 Our paper "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping got into CVPR 2022
May 23, 2022 Our paper "Translating Images into Maps" won the Outstanding Paper Award at ICRA 2022
Mar 15, 2021 Our paper "Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation" got into ICRA 2021

invited talks & lectures

Sep 12, 2022 Tutorial at 3DV 2022, Prague: BEV mapping and addressing its shortcomings [Slides]
Jul 15, 2022 Invited talk at Amazon, Tübingen: Sparse representations for scene understanding [Slides]
Jan 20, 2022 Invited talk at Wayve, London: Image-to-birds eye view mapping [Slides]

selected publications

  1. sportvu.gif
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In 2023 International Conference on Computer Vision (ICCV), 2023
  2. message_passing.png
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
  3. image_to_bev_motivation.gif
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In 2022 International Conference on Robotics and Automation (ICRA), 2022
  4. video_bev_results.gif
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021