I'm a fourth-year Ph.D. student in the Department of Automation at Tsinghua University, advised by Prof. Xiangyang Ji.
My research focuses on Reinforcement Learning and Large Language Models.
I work with the THU-IDM team, where we develop efficient algorithms for decision-making.
Prior to my doctoral studies, I received my B.E. degree from the Department of Automation at Tsinghua University.
[2022-09] Started my Ph.D. journey at Tsinghua University.
Research
(19 publications)
I'm interested in Reinforcement Learning and Large Language Models. My research focuses on efficient and intelligent decision-making with minimal environment interactions.
This study introduces Generalizable Predictive Prompt Selection (GPS), which performs Bayesian inference towards prompt difficulty using a lightweight generative model trained on the shared optimization history.
This work introduces Model Predictive Prompt Selection, a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions.
We introduce HINT, the first autoregressive framework for multi-human motion generation with hierarchical interaction modeling in diffusion. HINT leverages disentangled motion representation and a sliding-window strategy, achieving an FID of 3.100 on InterHuman, significantly improving over the previous state-of-the-art.
We develop UDS (Utility-Diversity Sampling), a framework for efficient online batch selection in LLM supervised fine-tuning that leverages the nuclear norm of the logits matrix to capture both data utility and intra-sample diversity, eliminating the need for external resources.
We present Validation-Aligned Optimization (VAO), a principled data-sharing method that adaptively reweights cross-task data contributions based on validation performance feedback.
This paper analyzes challenges in object pose estimation within industrial environments, based on our winning solutions in the 2025 Perception Challenge for Bin-Picking. We highlight two unexpected observations: methods trained on object-specific datasets performed worse than those on unseen data, and evaluation results varied significantly depending on the chosen metrics.
This work aims to address the over-conservatism of the density and sample constraints while avoiding complex behavior modeling required by the support constraint.
We propose an easy-to-implement method, referred to as Posterior and Diversity Synergized Task Sampling (PDTS), to accommodate fast and robust sequential decision-making.
We introduce Model Predictive Task Sampling (MPTS), a framework that bridges the task space and adaptation risk landscape, providing a theoretical foundation for robust active task sampling.
We consider explicitly generative modeling task distributions placed over task identifiers and propose robustifying fast adaptation from adversarial training.
To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation.
This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration.
We propose LLM-Empowered State Representation (LESR), a novel approach that utilizes LLM to autonomously generate task-related state representation codes which help to enhance the continuity of network mappings and facilitate efficient training.
We propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research.
In this paper, we propose Complementary Attention for Multi-Agent reinforcement learning (CAMA), which applies a divide-and-conquer strategy on input entities accompanied with the complementary attention of enhancement and replenishment.
We introduce HoK3v3, a 3v3 game environment based on Honor of Kings for multi-agent reinforcement learning research, posing a unique challenge for generalization in heterogeneous MARL with diverse heroes and lineups.