Zuhao Yang

I am a third-year Ph.D. candidate with Visual Intelligence Lab at Nanyang Technological University (NTU), supervised by Prof. Shijian Lu. Prior to joining NTU, I obtained my B.S. degree in Computing Science from University of Alberta. I am currently working on native multimodal foundation models at Kimi (Moonshot AI), advised by Dr. Haoning Wu and Xinyu Zhou. Previously, I worked closely with Dr. Lidong Bing at MiroMind and Dr. Song Bai at ByteDance. I also enjoy vibe building with other researchers at LMMs-Lab, a non-profit open-source organization led by Dr. Bo Li and Prof. Ziwei Liu. My research centers on the long-standing quest for building video-centric multimodal intelligence, spanning temporal grounding, agentic reasoning, long-horizon tool use, and self-evolving multi-agent systems. WeChat: 17310143309

Exciting News

2026.07 - Kimi K3 and PerceptionBench are released.
2026.06 - Kimi K2.7 Code is released. One paper is accepted by ECCV 2026.
2026.05 - ParaVT, PRISM, and WorldReasonBench are released. One paper is accepted by VLDB 2026.
2026.04 - Kimi K2.6 and Evolving Visual Generation are released.
2026.03 - MiroThinker-1.7 & H1 is released.
2026.02 - Four papers are accepted by CVPR 2026.
2025.10 - One paper is accepted by SIGGRAPH Asia 2025.
2025.08 - One paper is accepted by EMNLP 2025.
2025.06 - Two papers are accepted by ICCV 2025.
2025.05 - Two papers are accepted by ACL 2025.
2023.09 - One paper is accepted by NeurIPS 2023.

Selected Publications (Full List)

Preprint

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
Zuhao Yang, Kaichen Zhang, Sudong Wang, Keming Wu, Zhongyu Yang, Bo Li, Xiaojuan Qi, Shijian Lu, Xingxuan Li, Lidong Bing
arXiv 2026
paper / bibtex / code

Survey

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Keming Wu^*†, Zuhao Yang^*†, and 25 other authors
arXiv 2026
paper / bibtex / webpage
^*Equal Contribution. ^†Project Organizer.

CVPR

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
Zuhao Yang*, Sudong Wang*, Kaichen Zhang*, Keming Wu, Sicong Leng, Yifan Zhang, Bo Li, Chengwei Qin, Shijian Lu, Xingxuan Li, Lidong Bing
CVPR 2026
paper / bibtex / code

CVPR

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Kaichen Zhang*, Keming Wu*, Zuhao Yang, Bo Li, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing
CVPR 2026
paper / bibtex / code

CVPR

SVAgent: Storyline-guided Long Video Understanding via Cross-modal Multi-agent Collaboration
Zhongyu Yang, Zuhao Yang, Shuo Zhan, Tan Yue, Wei Pang, Yingfang Yuan
CVPR 2026
paper / bibtex

CVPR

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li*, Zuhao Yang*, Xiaoqin Zhang, Ling Shao, Shijian Lu
CVPR 2026
paper / bibtex / code

ICCV

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao, Shijian Lu, Song Bai
ICCV 2025
paper / bibtex / webpage

ICCV

Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu, Shijian Lu, Song Bai
ICCV 2025
paper / bibtex / webpage

ACL

QAEval: Mixture of Evaluators for Question‑Answering Task Evaluation
Tan Yue, Rui Mao, Xuzhao Shi, Shuo Zhan, Zuhao Yang, Dongyan Zhao
ACL 2025
paper / bibtex / code

NeurIPS

FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross‑Entropy
Zuhao Yang*, Yingfang Yuan*, Yang Xu*, Shuo Zhan, Huajun Bai, Kefan Chen
NeurIPS 2023
paper / bibtex / code

Academic Services

Conference Reviewer

CVPR 24/25/26, ECCV 24/26, ACM MM 24/25/26, NeurIPS 24/25/26, ICLR 25, AISTATS 25/26, ICML 25, ICCV 25, BMVC 26, ARR 26, AAAI 27

Journal Reviewer

TPAMI, IJCV, TMC, TMI, PR, TCSVT, JEI

PC Member

Teaching Assistant

AI6121 - Computer Vision, NTU, Fall 2025
AI6126 - Advanced Computer Vision, NTU, Spring 2026

Invited Talks

2026.05 - At Cantina, covering TimeExpert, LongVT, ParaVT, and ToDRE.
2026.05 - At SenseTime, covering Evolving Visual Generation.
2026.01 - At AAAI (slides), covering MiroMind-M1, First Try Matters, MATPO, OpenMMReasoner, and LongVT.
2025.12 - At BAAI (slides & recording), covering OpenMMReasoner and LongVT.
2025.11 - At Dr. Bosheng Ding’s reading group, covering LongVT.

Technical Blogs

Chinese Blogs

English Blogs

Patent & Awards

Method, Device, and Medium for Video Temporal Grounding with Mixture-of-Experts, US Patent, 2026
Method, Device, and Medium for Generating Transition Videos with Diffusion Model, US Patent, 2026
Automatic Question Answering Method and Device, Electronic Device, Storage Medium, CN Patent, 2022
Outstanding Graduate with Distinction, University of Alberta, 2021
Dean’s Honor Roll Award, University of Alberta, 2018 - 2020
International Student Scholarship, University of Alberta, 2017 - 2019

Zuhao Yang (杨祖豪)

Exciting News

Selected Publications (Full List)

Academic Services

Invited Talks

Technical Blogs

Patent & Awards