Yanwei Li

About

/

I'm Yanwei Li (李彦玮), currently a Tenure-track Assistant Professor at School of Artificial Intelligence (SAI), Shanghai Jiao Tong University. Before that, I was a Senior Research Scientist on multi-modal model at ByteDance Seed, USA. I received my Ph.D. degree from the Chinese University of Hong Kong (CUHK) in 2024, supervised by Prof. Jiaya Jia. After my study, I was awarded the WAIC Yunfan Award (Rising Star) because of the research contributions.

My recent research focuses on Multi-modal Models and Generative AI for downstream tasks, including image/video generation, world model, and embodied AI for robotics. Some highlights include Seed 2.0, Seed 1.8, Seed 1.5-VL, LLaVA-OneVision, Mini-Gemini, LLaMA-VID, and LISA. See Google Scholar for a full list.

I'm always open to research collaborations. Please feel free to reach out if you are interested in working together.

To Prospective Students

We are actively looking for self-motivated students and researchers to join our group. Our research focuses on next generation Multi-modal Models, World Modeling, and Embodied AI for Robotics. We engage in extensive collaborations with leading research institutions and industry partners, providing abundant computing resources and practical opportunities to students. We provide the following opportunities:

Ph.D. / Master: Openings are available through the SJTU graduate admissions process each year. Please email me your CV and a brief research statement at least 3–6 months before the application deadline. Prior collaboration on a project is strongly encouraged — it helps both of us understand each other's working style and significantly strengthens your application.

Research Interns: Welcome interns with strong programming skills and a genuine interest in research. Internships can be conducted remotely or on-site. Please send your CV along with a short description of your background and interests in the email.

To apply, please email me directly with the subject line [Position] Your Name – Institution - Grade.

我是李彦玮，目前在上海交通大学人工智能学院（SAI）担任长聘教轨助理教授。此前，我在字节跳动 Seed (美国) 担任高级研究科学家，从事多模态基础模型研究，成果广泛应用于豆包多模态系列模型。我于2024年在香港中文大学（CUHK）取得博士学位，导师为贾佳亚教授。由于博士期间的研究贡献，我荣获 2025 年世界人工智能大会（WAIC）云帆奖。

我的研究聚焦于多模态基础模型与生成式 AI在下游任务中的应用，涵盖图像/视频生成、世界模型及具身智能机器人等方向。近期代表性工作包括 Seed 2.0、 Seed 1.8、 Seed 1.5-VL、 LLaVA-OneVision、 Mini-Gemini、 LLaMA-VID 和 LISA 等。完整论文列表请参见 Google Scholar。如果您对我们的工作感兴趣，欢迎随时联系，期待与感兴趣的研究者开展合作。

致有意加入的同学

我们长期招募有自驱力的学生和研究人员加入课题组，研究方向涵盖下一代多模态基础模型、世界建模与具身智能机器人。我们与国内外知名研究机构和产业界开展广泛的合作，为同学们提供丰富的计算资源和实践机会。目前提供以下申请机会：

博士/硕士：招生通过上海交通大学研究生招生流程进行，每年均开放申请名额。请在申请截止日期前至少 3–6 个月 通过邮件发送简历并附上简短研究兴趣与成果。 强烈建议提前开展合作项目，这不仅有助于我们了解彼此的工作方式，也能显著增强您在申请中的竞争力。

科研实习生：欢迎具备扎实编程能力、对科研有真实热情的实习生入组实习，可线上或线下进行。请发送简历并附上对背景和研究兴趣的简要介绍。

如果您感兴趣，欢迎直接发邮件给我，主题格式应为：[申请职位] 姓名 – 院校 - 年级。

News

2026/02 Released Seed 2.0, a worldwide top-ranked VLM.
2025/12 Released Seed 1.8, a worldwide top-ranked VLM.
2025/05 Serving as Area Chair for NeurIPS 2025.
2025/05 Released Seed 1.5-VL, a top VLM for image / multi-image / video.
2025/01 Awarded WAIC Yunfan Award (Rising Star) 2025.
2024/08 Released LLaVA-OneVision, a top VLM for image / multi-image / video.

Experience

2026.05 – Now

Shanghai Jiao Tong University

Tenure-track Assistant Professor

2024.09 – 2026.05

ByteDance Seed, USA

Senior Research Scientist

Multimodal Foundation Models

2022.06 – 2023.04

NVIDIA Research

Research Intern

Multimodal 3D Perception

2019.01 – 2022.05

MEGVII Research

Research Intern

2D & 3D Perception

Service

Area Chair

Neural Information Processing Systems (NeurIPS), 2025

Conference Reviewer

International Conference on Learning Representations (ICLR)
International Conference on Machine Learning (ICML)
Neural Information Processing Systems (NeurIPS)
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)
AAAI Conference on Artificial Intelligence (AAAI)

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Image Processing (TIP)
Pattern Recognition (PR)

Activity

Talks

"Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" — ByteDance / Tencent, 2024. [slides]
"LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models" — MIT / Huawei / Tencent, 2023. [slides]
"Representation for Multi-modality 3D Detection with Transformer" — ZhiDongXi, 2022. [slides]
"Towards Fully Convolutional Panoptic Segmentation" — ByteDance AI & BAAI, 2021. [slides]
"Dynamic Network and Semantic Segmentation" — Paper Weekly, 2020. [slides]
"FPN-based Network for Panoptic Segmentation" — ECCV COCO Workshop, 2018. [slides]

Teaching

CSCI1580: Visual Programming, TA, Fall 2022
ENGG5104: Image Processing and Computer Vision, TA, Spring 2022
CSCI1580: Visual Programming, TA, Fall 2021
CSCI2100B: Data Structures, TA, Spring 2021