Yanwei Li

Assistant Professor @ SAI, SJTU

Yanwei Li

About

/

I'm Yanwei Li (李彦玮), currently a Tenure-track Assistant Professor at School of Artificial Intelligence (SAI), Shanghai Jiao Tong University. Before that, I was a Senior Research Scientist on multi-modal model at ByteDance Seed, USA. I received my Ph.D. degree from the Chinese University of Hong Kong (CUHK) in 2024, supervised by Prof. Jiaya Jia. After my study, I was awarded the WAIC Yunfan Award (Rising Star) because of the research contributions.

My recent research focuses on Multi-modal Models and Generative AI for downstream tasks, including image/video generation, world model, and embodied AI for robotics. Some highlights include Seed 2.0, Seed 1.8, Seed 1.5-VL, LLaVA-OneVision, Mini-Gemini, LLaMA-VID, and LISA. See Google Scholar for a full list.

I'm always open to research collaborations. Please feel free to reach out if you are interested in working together.

To Prospective Students

We are actively looking for self-motivated students and researchers to join our group. Our research focuses on next generation Multi-modal Models, World Modeling, and Embodied AI for Robotics. We engage in extensive collaborations with leading research institutions and industry partners, providing abundant computing resources and practical opportunities to students. We provide the following opportunities:

Ph.D. / Master: Openings are available through the SJTU graduate admissions process each year. Please email me your CV and a brief research statement at least 3–6 months before the application deadline. Prior collaboration on a project is strongly encouraged — it helps both of us understand each other's working style and significantly strengthens your application.

Research Interns: Welcome interns with strong programming skills and a genuine interest in research. Internships can be conducted remotely or on-site. Please send your CV along with a short description of your background and interests in the email.

To apply, please email me directly with the subject line [Position] Your Name – Institution - Grade.

News

  • 2026/02 Released Seed 2.0, a worldwide top-ranked VLM.
  • 2025/12 Released Seed 1.8, a worldwide top-ranked VLM.
  • 2025/05 Serving as Area Chair for NeurIPS 2025.
  • 2025/05 Released Seed 1.5-VL, a top VLM for image / multi-image / video.
  • 2025/01 Awarded WAIC Yunfan Award (Rising Star) 2025.
  • 2024/08 Released LLaVA-OneVision, a top VLM for image / multi-image / video.

Experience

2026.05 – Now
Shanghai Jiao Tong University
Tenure-track Assistant Professor
2024.09 – 2026.05
ByteDance Seed, USA
Senior Research Scientist
Multimodal Foundation Models
2022.06 – 2023.04
NVIDIA Research
Research Intern
Multimodal 3D Perception
2019.01 – 2022.05
MEGVII Research
Research Intern
2D & 3D Perception

Service

Area Chair
Neural Information Processing Systems (NeurIPS), 2025
Conference Reviewer
International Conference on Learning Representations (ICLR)
International Conference on Machine Learning (ICML)
Neural Information Processing Systems (NeurIPS)
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)
AAAI Conference on Artificial Intelligence (AAAI)
Journal Reviewer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Image Processing (TIP)
Pattern Recognition (PR)

Activity

Talks
  • "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" — ByteDance / Tencent, 2024. [slides]
  • "LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models" — MIT / Huawei / Tencent, 2023. [slides]
  • "Representation for Multi-modality 3D Detection with Transformer" — ZhiDongXi, 2022. [slides]
  • "Towards Fully Convolutional Panoptic Segmentation" — ByteDance AI & BAAI, 2021. [slides]
  • "Dynamic Network and Semantic Segmentation" — Paper Weekly, 2020. [slides]
  • "FPN-based Network for Panoptic Segmentation" — ECCV COCO Workshop, 2018. [slides]
Teaching
  • CSCI1580: Visual Programming, TA, Fall 2022
  • ENGG5104: Image Processing and Computer Vision, TA, Spring 2022
  • CSCI1580: Visual Programming, TA, Fall 2021
  • CSCI2100B: Data Structures, TA, Spring 2021