I am a third-year Ph.D. candidate with Visual Intelligence Lab at Nanyang Technological University (NTU), supervised by Prof. Shijian Lu. Prior to joining NTU, I obtained my B.S. degree in Computing Science from University of Alberta. I am currently working on native multimodal foundation models at Kimi (Moonshot AI), advised by Dr. Haoning Wu and Xinyu Zhou. Previously, I worked closely with Dr. Lidong Bing at MiroMind and Dr. Song Bai at ByteDance. I also enjoy vibe building with other researchers at LMMs-Lab, a non-profit open-source organization led by Dr. Bo Li and Prof. Ziwei Liu. My research centers on the long-standing quest for building video-centric multimodal intelligence, spanning temporal grounding, agentic reasoning, long-horizon tool use, and self-evolving multi-agent systems.
I am open to any interesting ideas, questions, and future opportunities. Feel free to contact me via WeChat: 17310143309.
Exciting News
- 2026.05 - ParaVT, PRISM, and WorldReasonBench were released. Several papers on Audio-Visual Captioning, Multimodal Evaluation, and Database Agent are coming soon.
- 2026.04 - Evolving Visual Generation was released.
- 2026.03 - MiroThinker-1.7 & H1 was released.
- 2026.02 - Four papers were accepted by CVPR 2026.
- 2025.10 - One paper was accepted by SIGGRAPH Asia 2025.
- 2025.08 - One paper was accepted by EMNLP 2025.
- 2025.06 - Two papers were accepted by ICCV 2025.
- 2025.05 - Two papers were accepted by ACL 2025.
- 2023.09 - One paper was accepted by NeurIPS 2023.
Selected Publications (Full List)

Zuhao Yang, Kaichen Zhang, Sudong Wang, Keming Wu, Zhongyu Yang, Bo Li, Xiaojuan Qi, Shijian Lu, Xingxuan Li, Lidong Bing
arXiv 2026
paper / bibtex / code

Keming Wu*†, Zuhao Yang*†, and 25 other authors
arXiv 2026
paper / bibtex / webpage
*Equal Contribution. †Project Organizer.

Zuhao Yang*, Sudong Wang*, Kaichen Zhang*, Keming Wu, Sicong Leng, Yifan Zhang, Bo Li, Chengwei Qin, Shijian Lu, Xingxuan Li, Lidong Bing
CVPR 2026
paper / bibtex / code

Kaichen Zhang*, Keming Wu*, Zuhao Yang, Bo Li, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing
CVPR 2026
paper / bibtex / code

Zhongyu Yang, Zuhao Yang, Shuo Zhan, Tan Yue, Wei Pang, Yingfang Yuan
CVPR 2026
paper / bibtex

Duo Li*, Zuhao Yang*, Xiaoqin Zhang, Ling Shao, Shijian Lu
CVPR 2026
paper / bibtex

Zuhao Yang, Yingchen Yu, Yunqing Zhao, Shijian Lu, Song Bai
ICCV 2025
paper / bibtex / webpage

Zuhao Yang, Jiahui Zhang, Yingchen Yu, Shijian Lu, Song Bai
ICCV 2025
paper / bibtex / webpage

Tan Yue, Rui Mao, Xuzhao Shi, Shuo Zhan, Zuhao Yang, Dongyan Zhao
ACL 2025
paper / bibtex / code

Zuhao Yang*, Yingfang Yuan*, Yang Xu*, Shuo Zhan, Huajun Bai, Kefan Chen
NeurIPS 2023
paper / bibtex / code
Academic Services
Conference Reviewer
- CVPR 24/25/26, ECCV 24/26, ACM MM 24/25/26, NeurIPS 24/25/26, ICLR 25, AISTATS 25/26, ICML 25, ICCV 25, BMVC 26
Journal Reviewer
- IEEE TPAMI, Pattern Recognition, Journal of Electronic Imaging
Workshop PC Member
- SyntaGen: Harnessing Generative Models for Synthetic Visual Datasets (CVPR 24/25)
- Neural Rendering Intelligence (CVPR 24)
- Engineering Agentic Intelligence: A Pipeline from Efficient Reasoning to Multimodal Grounding to Agentic Collaboration (AAAI 26)
Teaching Assistant
- AI6121 - Computer Vision, NTU, Fall 2025 / Spring 2026
Invited Talks
- 2026.05 - At Cantina, covering TimeExpert, LongVT, ParaVT, and ToDRE.
- 2026.05 - At SenseTime, covering Evolving Visual Generation.
- 2026.01 - At AAAI (slides), covering MiroMind-M1, First Try Matters, MATPO, OpenMMReasoner, and LongVT.
- 2025.12 - At BAAI (slides & recording), covering OpenMMReasoner and LongVT.
- 2025.11 - At Dr. Bosheng Ding’s reading group, covering LongVT.
Technical Blogs
Chinese Blogs
- LLM Tool Calling & Reinforcement Learning (by Dr. Handuo Zhang)
- Thinking with Images & Agentic Tool Use (by Zuhao Yang)
English Blogs
- OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe (by Kaichen Zhang)
- LongVT: Incentivizing “Thinking with Long Videos” via Native Tool Calling (by Zuhao Yang)
Patent & Awards
- Method, Device, and Medium for Video Temporal Grounding with Mixture-of-Experts, US Patent, 2025
- Method, Device, and Medium for Generating Transition Videos with Diffusion Model, SG Patent, 2024
- Method, Device, and Medium for Automatic Question-Answering, CN Patent, 2022
- Outstanding Graduate with Distinction, University of Alberta, 2021
- Dean’s Honor Roll Award, University of Alberta, 2018 - 2020
- International Student Scholarship, University of Alberta, 2017 - 2019
