Publications

(2025). LIMI: Less is More for Agency. In Arxiv.
(2025). DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery. In ArXiv.
(2025). PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?. In COLM 2025.