Peng Chen

I am a third-year master student at Institute of Software, Chinese Academy of Sciences. I received my B.S. in Computer Science from University of Science and Technology, Beijing in 2023 and obtained Beijing Distinguished Graduate Award and Beijing Outstanding Graduation Thesis.

I serve as a reviewer for international conferences including ICLR, AAAI, ICME and ISMAR.

My research focuses on the following areas:

  • MLLM (VLM/RL): video understanding, and GUI/Game/Embodied agent;
  • AIGC (Diffusion/DiT): video/image generation, and unified model;
  • 3D Vision (3DGS): digital humans;

I am currently looking for campus recruitment opportunities for 2026, focusing on multimodal large language models(MLLM).

Email  /  Github  /  Google Scholar

profile photo
Research
Mixed
[ICCV 2025] CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen*, Pi Bu*, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, Yingxiu Zhao, Qi Zhu, Jun Song, Siran Yang, Jiamang Wang, Bo Zheng
Paper / Project / Code

We propose CombatVLA, the first efficient visual-language action model designed for combat tasks in 3D action role-playing games. For efficient decision making, our CombatVLA is a 3B model that processes visual inputs and outputs a sequence of actions to control the game (including keyboard and mouse operations).

Visual_CoG.png
[arXiv 2025, preprint] Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation
Yaqi Li*, Peng Chen*, Mingyang Han*, Pi Bu*, Haoxiang Shi, Runzhou Zhao, Yang Yao, Xuan Zhang, Jun Song
Paper / Project / Code

We propose a unified VLM named Visual-CoG, which leverages reinforcement learning (RL) with stage-aware rewards to provide immediate guidance throughout the image generation process, significantly improving performance on complex text-to-image tasks.

Mixed
[ACM MM 2025] MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians
Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu
Paper / Project / Code

We use 2DGS to maintain the surface geometry and employ 3DGS for color correction in areas where the rendering quality of 2DGS is insufficient, reconstructing a realistically and geometrically accurate 3D head avatar.

RSATalker.png
[TVCG 2026] RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation
Peng Chen, Xiaobao Wei, Yi Yang, Naiming Yao, Hui Chen, Tian Feng
Paper / Project / Code

RSATalker achieves realistic talking head generation for multi-turn conversation. It can perceive the social relationship between the speaker and listener, thereby expressing facial movements more accurately.

GazeGaussian
[ICCV Highlight 2025] GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting
Xiaobao Wei, Peng Chen, Guangyu Li, Ming Lu, Hui Chen, Feng Tian
Paper / Project / Code

We propose GazeGaussian, a high-fidelity gaze redirection method that uses a two-stream 3DGS model to represent the face and eye regions separately.

GraphAvatar
[AAAI 2025] GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
Xiaobao Wei, Peng Chen, Ming Lu, Hui Chen, Feng Tian
Paper / Project / Code

We propose GraphAvatar, a compact method using Graph Neural Networks (GNN) to generate 3D Gaussians for head avatar animation, offering superior rendering performance and minimal storage requirements.

VARP
[NeurIPS Workshop 2024] Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case
Peng Chen*, Pi Bu*, Jun Song, Yuan Gao, Bo Zheng
Paper / Project / 量子位

We propose a novel framework named the VARP agent, which directly takes game screenshots as input and generates keyboard and mouse operations to play the ARPG.

DiffusionTalker
[ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation
Peng Chen, Xiaobao Wei, Ming Lu, Hui Chen, Feng Tian
Paper / Project / Code

We propose DiffusionTalker, a diffusion-based method that utilizes contrastive personalizer to generate personalized 3D facial animation and personalizer-guided distillation for acceleration and compression.

BYOC
[IEEE VR 2024] Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters
Zechen Bai*, Peng Chen*, Xiaolan Peng, Lu Liu, Naiming Yao, Hui Chen, Feng Tian
Paper / Code

Given a target facial video as reference, bring your own character into our solution integrated with Unity3D, it automatically generates facial animation for the virtual character.

Internships
  • [05/2025 - 12/2025] Tencent, TEG, Hunyuan (Qingyun Project)
    Research intern for GRPO-based VLM for video understanding and GRPO-based video generation model.
  • [04/2024 - 05/2025] Alibaba, Taotian, Future Lab
    Research intern for MLLM, focusing on VLM-based VLA agents for games and GUI, RL reasoning model, and unified language models.
  • [11/2023 - 04/2024] AMD, Xilinx AI
    Research intern for diffusion-based AIGC, especially focused on improving ControlNet and Stable Diffusion for image generation.
  • [07/2023 - 08/2023] Baidu, ACG
    Research intern for LLM evaluation, focusing on the automated evaluation of text-based question-answering tasks for the Wenxin large language model and reward model.
News
  • [06/2023] Beijing Outstanding Graduation Design (Thesis), 2023.
  • [06/2023] Beijing Distinguished Graduate Award, 2023.

Last updated: Aug. 2025
Web page design credit to Jon Barron