Yun Qu | 曲云
I'm a fourth-year Ph.D. student in the Department of Automation at Tsinghua University, advised by Prof. Xiangyang Ji.
My research focuses on Reinforcement Learning and Large Language Models.
I work with the THU-IDM team, where we develop efficient algorithms for decision-making.
Prior to my doctoral studies, I received my B.E. degree from the Department of Automation at Tsinghua University.
Email /
Scholar /
Twitter /
Github
|
|
News
- [2025-09] One paper accepted to NeurIPS 2025!
- [2025-05] PDTS was accepted to ICML 2025!
- [2024-12] LaRe was accepted to AAAI 2025.
- [2024-09] Two papers accepted to NeurIPS 2024!
- [2022-09] Started my Ph.D. journey at Tsinghua University.
|
Research
I'm interested in Reinforcement Learning and Large Language Models. My research focuses on efficient and intelligent decision-making with minimal environment interactions. Representative works are highlighted.
|
|
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
Yun Qu,
Qi (Cheems) Wang,
Yixiu Mao,
Vincent Tao Hu,
Björn Ommer,
Xiangyang Ji
arxiv, 2025
paper
/
code
This work introduces Model Predictive Prompt Selection, a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions.
|
|
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu*,
Qi (Cheems) Wang*,
Yixiu Mao*,
Yiqin Lv,
Xiangyang Ji
ICML, 2025
paper
/
code
We propose an easy-to-implement method, referred to as Posterior and Diversity Synergized Task Sampling (PDTS), to accommodate fast and robust sequential decision-making.
|
|
Model Predictive Task Sampling for Efficient and Robust Adaptation
Qi (Cheems) Wang*,
Zehao Xiao*,
Yixiu Mao*,
Yun Qu*,
Jiayi Shen,
Yiqin Lv,
Xiangyang Ji
arxiv, 2025
paper
/
code
We introduce Model Predictive Task Sampling (MPTS), a framework that bridges the task space and adaptation risk landscape, providing a theoretical foundation for robust active task sampling.
|
|
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu*,
Yuhang Jiang*,
Boyuan Wang,
Yixiu Mao,
Qi (Cheems) Wang*,
Chang Liu,
Xiangyang Ji
AAAI, 2025
paper
/
code
We introduce LaRe, a novel LLM-empowered symbolic-based decision-making framework, to improve credit assignment in episodic reinforcement learning.
|
|
Doubly Mild Generalization for Offline Reinforcement Learning
Yixiu Mao,
Qi (Cheems) Wang,
Yun Qu,
Yuhang Jiang,
Xiangyang Ji
NeurIPS, 2024
paper
/
code
To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation.
|
|
Offline reinforcement learning with ood state correction and ood action suppression
Yixiu Mao,
Qi (Cheems) Wang,
Chen Chen,
Yun Qu,
Xiangyang Ji
NeurIPS, 2024
paper
/
code
We propose SCAS, a simple yet effective approach that unifies OOD state correction and OOD action suppression in offline RL.
|
|
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
Yun Qu,
Boyuan Wang,
Yuhang Jiang,
Jianzhun Shao,
Yixiu Mao,
Qi (Cheems) Wang*,
Chang Liu,
Xiangyang Ji
arxiv, 2024
paper
This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration.
|
|
Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation
Qi (Cheems) Wang*,
Yiqin Lv*,
Yixiu Mao*,
Yun Qu,
Yi Xu,
Xiangyang Ji
KDD, 2025
project page
/
paper
/
code
We consider explicitly generative modeling task distributions placed over task identifiers and propose robustifying fast adaptation from adversarial training.
|
|
LLM-Empowered State Representation for Reinforcement Learning
Boyuan Wang*,
Yun Qu*,
Yuhang Jiang,
Chang Liu,
Wenming Yang,
Xiangyang Ji
ICML, 2024
paper
/
code
We propose LLM-Empowered State Representation (LESR), a novel approach that utilizes LLM to autonomously generate task-related state representation codes which help to enhance the continuity of network mappings and facilitate efficient training.
|
|
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
Yun Qu*,
Boyuan Wang*,
Jianzhun Shao*,
Yuhang Jiang,
Chen Chen,
Zhenbin Ye,
Linc Liu,
Yang Feng,
Lin Lai,
Hongyang Qin,
Minwen Deng,
Juchao Zhuo,
Deheng Ye,
Qiang Fu,
Yang Guang,
Wei Yang,
Lanxiao Huang,
Xiangyang Ji
NeurIPS D&B Track, 2023
project page
/
paper
/
code
We propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research.
|
|
Counterfactual Conservative Q Learning for Offline Multi-Agent Reinforcement Learning
Jianzhun Shao*,
Yun Qu*,
Chen Chen,
Hongchang Zhang,
Xiangyang Ji
NeurIPS, 2023
paper
/
code
We propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation.
|
|
Complementary Attention for Multi-Agent Reinforcement Learning
Jianzhun Shao,
Hongchang Zhang,
Yun Qu,
Chang Liu,
Shuncheng He,
Yuhang Jiang,
Xiangyang Ji
ICML, 2023
paper
/
code
In this paper, we propose Complementary Attention for Multi-Agent reinforcement learning (CAMA), which applies a divide-and-conquer strategy on input entities accompanied with the complementary attention of enhancement and replenishment.
|
Miscellaneous
Academic Service
- Reviewer for NeurIPS, ICML, ICLR, AAAI
Awards & Honors
- Outstanding Graduate Award, Beijing, 2022
- Outstanding Graduate Award, Tsinghua University, 2022
Collaborations
I'm fortunate to collaborate with researchers from THU-IDM and other institutions.
|
×
|