I'm Yanwei Li (李彦玮), currently working as a Research Scientist on Foundation Model for Vision & Language at ByteDance Seed, San Jose, USA.
Before that, I obtained Ph.D degree in The Chinese University of Hong Kong (CUHK), supervised by Prof.
Jiaya Jia.
Previously, I spent wonderful time in NVIDIA, MEGVII, and other top research group. During these periods, I was fortunate to collaborate with several top researchers like Prof.
Anima Anandkumar (CalTech), Prof.
Sanja Fidler (UofT), and Dr.
Jian Sun.
My research interests mainly focus on Multi-modality Foundation Model and Generative AI. My recent work includes
Seed 1.5-VL,
LLaVA-OneVision,
Mini-Gemini,
LLaMA-VID,
LISA, and
Video-MME.
More work please refer to
Publication and my
Google Scholar.
ByteDance Seed
Fields: Multimodal Foundation Model.
Research Scientist, 2024.09 - Now
NVIDIA Research
Fields: Multimodal Perception.
Research Intern, 2022.06 - 2023.04
MEGVII
Fields: 2D & 3D Perception.
Research Intern, 2019.01 - 2022.05
Area Chair:
Neural Information Processing Systems (NeurIPS), 2025.
Academic Talk:
"LLaMA-VID:An Image is Worth 2 Tokens in Large Language Models", MIT/Huawei/Tencent, 2023. [slides]
"Representation for Multi-modality 3D Detection with Transformer", ZhiDongXi, 2022. [slides]
"Towards Fully Convolutional Panoptic Segmentation", ByteDance AI & BAAI, 2021. [slides]
"Dynamic Network and Semantic Segmentation", Paper Weekly, 2020. [slides]
"FPN-based Network for Panoptic Segmentation", ECCV COCO Workshop, 2018. [slides]
Microsoft Fellowship Nomination, 2022
Postgraduate Scholarship, 2020-2024
National Scholarship, 2019
National Scholarship, 2016