I am Kanzhi Cheng (η¨‹ηž°δΉ‹), a PhD student (2021.9 - ) in the NLP Group at Nanjing University, advised by Dr. Jiajun Chen & Dr. Jianbing Zhang. Previously, I worked as a research intern at Shanghai AI Lab, Tsinghua AIR, and Microsoft Research. I am deeply grateful for the opportunity to work with and learn from Dr. Zhiyong Wu , Dr. Hao Zhou, and Dr. Qianhui Wu.

Currently, I am broadly interested in multimodal intelligence, with a focus on:

I expect to graduate in 2026. Please feel free to reach out!

πŸ”₯ News

  • 2025.11: Β πŸ“šπŸ“š Started my visit to NTU, Singapore πŸ‡ΈπŸ‡¬.
  • 2025.09: Β πŸŽ‰πŸŽ‰ GUI-Actor is accepted by NeurIPS 2025.
  • 2025.07: Β πŸ–οΈπŸ–οΈ See you at Vienna πŸ‡¦πŸ‡Ή!
  • 2025.06: Β πŸ€–πŸ€– We release GUI-Actor to advance visual grounding for GUI Agents.
  • 2025.05: Β πŸŽ‰πŸŽ‰ Four papers are accepeted by ACL 2025.

πŸ“ Selected Publications

ACL 2024 (Main)
sym

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

Code Β  Models&Data Β  Β 

NeurIPS 2025 (Poster)
sym

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Qianhui Wu*, Kanzhi Cheng*, Rui Yang*, Chaoyun Zhang, Jianwei Yang, Huiqiang Jiang, Jian Mu, Baolin Peng, Bo Qiao, Reuben Tan, Si Qin, Lars Liden, Qingwei Lin, Huan Zhang, Tong Zhang, Jianbing Zhang, Dongmei Zhang, Jianfeng Gao

Code Β  Project Page Β  Models&Data Β  Β 

ACL 2025 (Main)
sym

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Qiushi Sun*, Kanzhi Cheng*, Zichen Ding*, Chuanyang Jin*, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu

Code Β  Project Page Β  Models&Data Β  Β 

NAACL 2025 (Main)
sym

Vision-Language Models Can Self-Improve Reasoning via Reflection
Kanzhi Cheng*, Yantao Li*, Fangzhi Xu, Jianbing Zhang, Hao Zhou, Yang Liu

Code Β  Β 

ACL 2025 (Findings)
sym

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng*, Wenpo Song*, Jiaxin Fan*, Zheng Ma, Qiushi Sun, Fangzhi Xu, Chenyang Yan, Nuo Chen, Jianbing Zhang, Jiajun Chen

Code Β  Project Page Β  Β 

πŸŽ– Honors and Awards

  • 2025.05 Excellent Graduate Student, Nanjing University
  • 2024.10 First-Class Graduate Talent Scholarship, Nanjing University
  • 2023.10 Outstanding Graduate Award, Nanjing University
  • 2021.06 Outstanding Graduation Project, Nanjing University
  • 2019.10 Guanghua Scholarship (1%), Nanjing University

πŸ“– Educations

  • 2021.09 - 2026 (now), Ph.D. Student at the Department of Computer Science and Technology, Nanjing University.
  • 2017.09 - 2021.06, B.E. at the School of Management and Engineering, Nanjing University.

πŸ’» Internships

  • 2025.04 - 2025.06, Microsoft Research.
  • 2024.08 - 2025.01, Tsinghua AIR, China.
  • 2023.08 - 2024.04, Shanghai Artificial Intelligence Laboratory, China.

⚽️ Personal Interests

I am an amateur football player, primarily playing as an attacking midfielder. As a player, I’ve been fortunate to win the following honors:

  • Member of the Nanjing University official football team
  • πŸ† Champion of the 2022–2023 Nanjing University Caigen Cup, awarded Final MVP
  • πŸ† Champion of the 2018–2019 Nanjing University FA Cup

I am also a fan of Arsenal Football Club.

football1
football2
football3