I am Kanzhi Cheng (程瞰之), a PhD student (2021.9 - ) in the NLP Group at Nanjing University, advised by Dr. Jiajun Chen & Dr. Jianbing Zhang. Previously, I worked as a research intern at Shanghai AI Lab, Tsinghua AIR, and Microsoft Research. I am deeply grateful for the opportunity to work with and learn from Dr. Zhiyong Wu , Dr. Hao Zhou, and Dr. Qianhui Wu.
Currently, I am broadly interested in multimodal intelligence, with a focus on:
- (Multimodal) Autonomous Agents: particularly GUI agents capable of acting in the digital world to automate complex tasks (SeeClick, OS-Atlas, OS-Genesis, GUI-Actor).
- Large Vision-Language Models: building multimodal systems that can understand image/video, generate text (Beyond Generic, CapArena), and perform reasoning (MM-Self-Improve).
I expect to graduate in 2026. Please feel free to reach out!
🔥 News
- 2025.07: 🏖️🏖️ See you at Vienna 🇦🇹!
- 2025.06: 🤖🤖 We release GUI-Actor to advance visual grounding for GUI Agents.
- 2025.05: 🎉🎉 Four papers are accepeted by ACL 2025.
📝 Selected Publications

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Qianhui Wu*, Kanzhi Cheng*, Rui Yang*, Chaoyun Zhang, Jianwei Yang, Huiqiang Jiang, Jian Mu, Baolin Peng, Bo Qiao, Reuben Tan, Si Qin, Lars Liden, Qingwei Lin, Huan Zhang, Tong Zhang, Jianbing Zhang, Dongmei Zhang, Jianfeng Gao

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Qiushi Sun*, Kanzhi Cheng*, Zichen Ding*, Chuanyang Jin*, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu

Vision-Language Models Can Self-Improve Reasoning via Reflection
Kanzhi Cheng*, Yantao Li*, Fangzhi Xu, Jianbing Zhang, Hao Zhou, Yang Liu

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng*, Wenpo Song*, Jiaxin Fan*, Zheng Ma, Qiushi Sun, Fangzhi Xu, Chenyang Yan, Nuo Chen, Jianbing Zhang, Jiajun Chen
-
ACMMM 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang -
NLPCC 2022
ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora
Kanzhi Cheng, Zheng Ma, Shi Zong, Jianbing Zhang, Xinyu Dai, Jiajun Chen -
WCUA@ICML 2025
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu -
ACL 2025 (Main)
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Qiushi Sun, Kanzhi Cheng, Junxian He, Jun Liu, Zhiyong Wu -
ACL 2025 (Main)
Interative Evolution: A Neural-symbolic Self-Training Framework for Large Language Models
Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu -
ICLR 2025 (Spotlight)
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao -
Preprint
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond
Qiushi Sun, Zhirui Chen, Fangzhi Xu, Chang Ma, Kanzhi Cheng, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Pengcheng Yin, Qipeng Guo, Xipeng Qiu, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu
🎖 Honors and Awards
- 2025.05 Excellent Graduate Student, Nanjing University
- 2024.10 First-Class Graduate Talent Scholarship, Nanjing University
- 2023.10 Outstanding Graduate Award, Nanjing University
- 2021.06 Outstanding Graduation Project, Nanjing University
- 2019.10 Guanghua Scholarship (1%), Nanjing University
📖 Educations
- 2021.09 - 2026 (now), Ph.D. Student at the Department of Computer Science and Technology, Nanjing University.
- 2017.09 - 2021.06, B.E. at the School of Management and Engineering, Nanjing University.
💻 Internships
- 2025.04 - 2025.06, Microsoft Research.
- 2024.08 - 2025.01, Tsinghua AIR, China.
- 2023.08 - 2024.04, Shanghai Artificial Intelligence Laboratory, China.
⚽️ Personal Interests
I am an amateur football player, primarily playing as an attacking midfielder. As a player, I’ve been fortunate to win the following honors:
- Member of the Nanjing University official football team
- 🏆 Champion of the 2022–2023 Nanjing University Caigen Cup, awarded Final MVP
- 🏆 Champion of the 2018–2019 Nanjing University FA Cup
I am also a fan of Arsenal Football Club.