Boxi Yu

Senior Research Fellow at Lero, Ireland

prof_pic.jpg

I am a Senior Research Fellow at Lero, the Science Foundation Ireland Research Centre for Software, directed by Prof. Lionel C. Briand. I obtained my Ph.D. from The Chinese University of Hong Kong, Shenzhen in 2025, supervised by Prof. Pinjia He.

My research focuses on Trustworthy AI, Code Agents, and Automated Testing. I proposed Retromorphic Testing, a technique for automatically constructing test oracles for modern software. My work has been published at top-tier venues including ICML, ICSE, ISSTA, ESEC/FSE, and ACL.

news

Apr 30, 2026 Our ICML and ICML Position papers were accepted: “SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark” and “How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs”.
May 20, 2025 Our paper “UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench” was accepted by ACL’2025.
May 20, 2024 Our extended abstract “DSPy Guardrails: Building Safe LLM Applications via Self-Refining Language Model Pipelines” was accepted by Compound AI Systems Workshop (June 13th, 2024 in San Francisco at Data + AI Summit).
Dec 15, 2023 Our paper “Testing Graph Database Systems via Equivalent Query Rewriting” was accepted by ICSE’2024.
Oct 11, 2023 We introduce “Retromorphic Testing,” a new, general methodology to the test oracle problem. It is a black-box technique, which constructs a dual program architecture to test the target software, inspired by the concept of inverse function. Read the paper

Selected publications

  1. ICML
    SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark
    Boxi Yu, and  others
    ICML’26: International Conference on Machine Learning, 2026
  2. ACL
    UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
    Boxi Yu, Yuxuan Zhu, Pinjia He, and Daniel Kang
    2025
  3. arXiv
    Retromorphic Testing: A New Approach to the Test Oracle Problem
    Boxi Yu, Qiuyang Mang, Qingshuo Guo, and Pinjia He
    ArXiv, 2023
  4. ICML Position
    How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
    Jialun Cao, Yuk-Kit Chan, Zixuan Ling, Wenxuan Wang, Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, and 5 more authors
    ICML Position, 2026
  5. ICSE
    Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly Detection
    Boxi Yu, Jiayi Yao, Qiuai Fu, Zhiqing Zhong, Haotian Xie, Yaoliang Wu, Yuchi Ma, and Pinjia He
    ICSE’24: International Conference on Software Engineering, 2024
  6. CASW
    DSPy Guardrails: Building Safe LLM Applications via Self-Refining Language Model Pipelines
    Boxi Yu, and Pinjia He
    Compound AI Systems Workshop, 2024
  7. ESEC/FSE
    Automated Testing and Improvement of Named Entity Recognition Systems
    Boxi Yu, Yiyan Hu, Qiuyang Mang, Wenhan Hu, and Pinjia He
    ESEC/FSE’23: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023
  8. ISSTA
    ROME: Testing Image Captioning Systems via Recursive Object Melting
    Boxi Yu, Zhiqing Zhong, Jiaqi Li, Yixing Yang, Shilin He, and Pinjia He
    In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023
  9. ISSTA
    Automated testing of image captioning systems
    Boxi Yu, Zhiqing Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, and Pinjia He
    In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022