Avishkar Saha

Research Engineer @ MireloAI, developing foundation models for video, music & sound.

prof_pic.jpg

I'm a Research Engineer at MireloAI, where I design and train foundation models for video, music, and sound. I completed my PhD in Computer Vision at CVSSP, University of Surrey, supervised by Richard Bowden, Chris Russell and Oscar Mendez. I also interned at Amazon AWS' Causal Representation Learning Lab in Tübingen.

My work at MireloAI involves the complete machine learning pipeline — from large-scale data collection and curation through model architecture design to distributed training of multimodal foundation models. I'm particularly interested in building models that can understand and generate content across multiple modalities.

My interests include, but are not limited to: Geometric Deep Learning (i.e. Graphs), Data Efficient Machine Learning, and CUDA kernel development.

Google Scholar/Github/LinkedIn/CV

news

Aug 22, 2025 Our video-to-sound model - Mirelo SFX - is now available on fal.ai! Check out fal.ai's announcement and our blog post for more info.
Jan 15, 2025 Published tutorial on writing custom CUDA kernels for 2x faster LayerNorm
Feb 16, 2024 Completed my PhD defense at the University of Surrey
Nov 1, 2023 Started Research Engineer position at MireloAI
Jul 14, 2023 Our paper "Learning Adaptive Neighborhoods for Graph Neural Networks" got into ICCV 2023
Jun 1, 2023 Completed 6-month internship at Amazon AWS' Causal Representation Learning Lab in Tübingen
Sep 12, 2022 Tutorial at 3DV 2022, Prague: BEV mapping and addressing its shortcomings [Slides]
Jul 15, 2022 Invited talk at Amazon, Tübingen: Sparse representations for scene understanding [Slides]
Jun 1, 2022 Our paper "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping got into CVPR 2022
May 23, 2022 Our paper "Translating Images into Maps" won the Outstanding Paper Award at ICRA 2022
Jan 20, 2022 Invited talk at Wayve, London: Image-to-birds eye view mapping [Slides]
Mar 15, 2021 Our paper "Enabling spatio-temporal aggregation in birds-eye-view vehicle estimation" got into ICRA 2021

selected publications

  1. sportvu.gif
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In 2023 International Conference on Computer Vision (ICCV), 2023
  2. message_passing.png
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
  3. image_to_bev_motivation.gif
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In 2022 International Conference on Robotics and Automation (ICRA), 2022
  4. video_bev_results.gif
    Avishkar Saha, Oscar Mendez, Chris Russell, and Richard Bowden
    In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021