My Homepage

Research for a better world.

I'm Yanwei Li (李彦玮), currently working as a Research Scientist on Foundation Model for Vision & Language at ByteDance Seed, San Jose, USA. Before that, I obtained Ph.D degree in The Chinese University of Hong Kong (CUHK), supervised by Prof. Jiaya Jia.

Previously, I spent wonderful time in NVIDIA, MEGVII, and other top research group. During these periods, I was fortunate to collaborate with several top researchers like Prof. Anima Anandkumar (CalTech), Prof. Sanja Fidler (UofT), and Dr. Jian Sun.

My research interests mainly focus on Multi-modality Foundation Model and Generative AI. My recent work includes Seed 1.5-VL, LLaVA-OneVision, Mini-Gemini, LLaMA-VID, LISA, and Video-MME. More work please refer to Publication and my Google Scholar.


Experience

ByteDance Seed
Fields: Multimodal Foundation Model.
Research Scientist, 2024.09 - Now

NVIDIA Research
Fields: Multimodal Perception.
Research Intern, 2022.06 - 2023.04

MEGVII
Fields: 2D & 3D Perception.
Research Intern, 2019.01 - 2022.05


Service

Area Chair:
Neural Information Processing Systems (NeurIPS), 2025.

Conference Reviewer:
International Conference on Learning Representations (ICLR).
International Conference on Machine Learning (ICML).
Neural Information Processing Systems (NeurIPS).
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE International Conference on Computer Vision (ICCV).
European Conference on Computer Vision (ECCV).
AAAI Conference on Artificial Intelligence (AAAI).

Journal Reviewer:
IEEE Transactions on Pattern Analysis and Machine Intelligence.
International Journal of Computer Vision.
IEEE Transactions on Image Processing.
Pattern Recognition.


Activity

Academic Talk:
"LLaMA-VID:An Image is Worth 2 Tokens in Large Language Models", MIT/Huawei/Tencent, 2023. [slides]
"Representation for Multi-modality 3D Detection with Transformer", ZhiDongXi, 2022. [slides]
"Towards Fully Convolutional Panoptic Segmentation", ByteDance AI & BAAI, 2021. [slides]
"Dynamic Network and Semantic Segmentation", Paper Weekly, 2020. [slides]
"FPN-based Network for Panoptic Segmentation", ECCV COCO Workshop, 2018. [slides]

Teaching Assistant:
CSCI1580: Visual Programming, Fall, 2022.
ENGG5104: Image Processing and Computer Vision, Spring, 2022.
CSCI1580: Visual Programming, Fall, 2021.
CSCI2100B: Data Structures, Spring, 2021.


Award

Microsoft Fellowship Nomination, 2022
Postgraduate Scholarship, 2020-2024
National Scholarship, 2019
National Scholarship, 2016