I am a second-year Ph.D. student with Visual Intelligence Lab at Nanyang Technological University (NTU), supervised by Prof. Shijian Lu. Prior to joining NTU, I obtained my B.S. degree in Computing Science from University of Alberta. I also work closely with Dr. Lidong Bing at MiroMind.ai and Dr. Song Bai when he was at ByteDance. My research centers on the long-standing quest for building video-centric multimodal intelligence, spanning controllable generation, temporal reasoning, agentic tool use, and long-term memory.

I enjoy collaborating with self-motivated researchers at LMMs-Lab, a non-profit open-source organization led by Bo Li and Prof. Ziwei Liu. Our mission is to advance large multimodal models with a shared vision of Feeling the AGI. We are actively looking for like-minded individuals to contribute to the community together!

🔥 Exciting News

  • 2025.10 - Four papers were released, focusing on multimodal reasoning (OpenMMReasoner), multimodal agentic tool use (LongVT), and visual token redundancy in both MLLMs (ToDRE) and diffusion-based MLLMs.
  • 2025.10 - One paper was accepted by SIGGRAPH Asia 2025.
  • 2025.08 - One paper was accepted by EMNLP 2025.
  • 2025.06 - Two papers were accepted by ICCV 2025.
  • 2025.05 - Two papers were accepted by ACL 2025.
  • 2023.09 - One paper was accepted by NeurIPS 2023.

📝 Selected Publications (Full List)

Preprint LVT
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
Zuhao Yang*, Sudong Wang*, Kaichen Zhang*, Keming Wu, Sicong Leng, Yifan Zhang, Chengwei Qin, Bo Li, Shijian Lu, Xingxuan Li, Lidong Bing
Preprint 2025
paper / bibtex / code
Preprint OMR
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Kaichen Zhang*, Keming Wu*, Zuhao Yang, Bo Li, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing
Preprint 2025
paper / bibtex / code
Preprint dMLLM
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li*, Zuhao Yang*, Xiaoqin Zhang, Ling Shao, Shijian Lu
Preprint 2025
paper / bibtex
Preprint ToDRE
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
Duo Li*, Zuhao Yang*, Xiaoqin Zhang, Ling Shao, Shijian Lu
Preprint 2025
paper / bibtex
ICCV TimeExpert
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao, Shijian Lu, Song Bai
ICCV 2025
paper / bibtex / webpage
ICCV VTG
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu, Shijian Lu, Song Bai
ICCV 2025
paper / bibtex / webpage
ACL QAEval
QAEval: Mixture of Evaluators for Question‑Answering Task Evaluation
Tan Yue, Rui Mao, Xuzhao Shi, Shuo Zhan, Zuhao Yang, Dongyan Zhao
ACL 2025
paper / bibtexcode
NeurIPS FACE
FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross‑Entropy
Zuhao Yang*, Yingfang Yuan*, Yang Xu*, Shuo Zhan, Huajun Bai, Kefan Chen
NeurIPS 2023
paper / bibtex / code

📖 Educational Background

  • 2024.01 - Present: Doctor of Philosophy, College of Computing and Data Science, Nanyang Technological University
  • 2022.08 - 2024.01: Master in Artificial Intelligence, College of Computing and Data Science, Nanyang Technological University
  • 2017.09 - 2021.06: Bachelor in Computing Science, Department of Computing Science, University of Alberta

🧑‍⚖️ Working Experiences

  • 2025.04 - Present: AI Scientist Intern, Shanda AI Research Institute & MiroMind.ai, Singapore
  • 2023.11 - 2025.03: AI Research Intern, ByteDance Inc. & TikTok, Singapore
  • 2021.05 - 2022.06: NLP Algorithm Engineer, TMI Robotics Technology, Shanghai

💻 Academic Services

Conference Reviewer

  • CVPR 24/25/26, ECCV 24, ACMMM 24, NeurIPS 24/25, ICLR 25, AISTATS 25/26, ICML 25, ICCV 25

Journal Reviewer

  • IEEE TPAMI, Pattern Recognition, Journal of Electronic Imaging

Workshop PC Member

Teaching Assistant

  • AI6121 - Computer Vision, NTU, 2025 Fall

🏆 Patent & Awards

  • Method, Device, and Medium for Video Temporal Grounding with Mixture-of-Experts, US Patent, 2025
  • Method, Device, and Medium for Generating Transition Videos with Diffusion Model, SG Patent, 2024
  • Method, Device, and Medium for Automatic Question-Answering, CN Patent, 2022
  • Outstanding Graduate, University of Alberta, 2021
  • Dean’s Honor Roll Award, University of Alberta, 2018 - 2020
  • International Student Scholarship, University of Alberta, 2017 - 2019