Haoyan Yang

Preprint, 2026

Capability Self-Assessment: Teaching LLMs to Know Their Limits ↗

Haoyan Yang*, R. Shirkavand*, Y. Jin, J. Zhou, S. Gao, H. Huang

Self-X

Preprint, 2026

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook ↗

Haoyan Yang, M. Xerri, S. Park, H. Zhang, Y. Feng, S. A. Kogilathota, J. Zhou

Self-X

ICLR 2026 WorkshopAI with Recursive Self-Improvement

Dynamic Noise Preference Optimization: Self-Improvement of LLMs with Self-Synthetic Data ↗

Haoyan Yang, K. Le, T. Hua, S. Gao, B. Xu, Z. Tang, J. Xu, N. V. Chawla, H. Jin, V. Srinivasan

Self-X

NeurIPS 2025

Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector ↗

Haoyan Yang, R. Bao, C. Xiao, J. Ma, P. Bhatia, S. Gao, T. Kass-Hout

Trustworthy

ACMMM 2024 WorkshopMultimedia Computing for Health and Medicine (Oral)

BurExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports ↗

Y. Chen*, Haoyan Yang*, H. Pan, F. Siddiqui, A. Verdone, Q. Zhang, S. Chopra, C. Zhao, Y. Shen

Applied

Preprint, 2024

PFID: Privacy First Inference Delegation Framework for LLMs ↗

Haoyan Yang, Z. Li, Y. Zhang, J. Wang, N. Cheng, M. Li, J. Xiao

Trustworthy

Preprint, 2024

Exploring Performance Contrasts in TableQA: Step-by-Step Reasoning Boosts Bigger Models, Limits Smaller Ones ↗

Haoyan Yang, Y. Wang, K. Tong, H. Zhu, Y. Zhang

Applied

Preprint, 2024

Can We Trust LLMs? Mitigate Overconfidence Bias through Knowledge Transfer ↗

Haoyan Yang, Y. Wang, X. Xu, H. Zhang, Y. Bian

Trustworthy

EMNLP 2023

PRCA: Fitting Black-Box LLMs for Retrieval QA via a Pluggable Reward-Driven Contextual Adapter ↗

Haoyan Yang, Z. Li, Y. Zhang, J. Wang, N. Cheng, M. Li, J. Xiao

RAG

Haoyan Yang

About

News

Selected Publications

Capability Self-Assessment: Teaching LLMs to Know Their Limits ↗

Self-Improvement of Large Language Models: A Technical Overview and Future Outlook ↗

Dynamic Noise Preference Optimization: Self-Improvement of LLMs with Self-Synthetic Data ↗

Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector ↗

BurExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports ↗

PFID: Privacy First Inference Delegation Framework for LLMs ↗

Exploring Performance Contrasts in TableQA: Step-by-Step Reasoning Boosts Bigger Models, Limits Smaller Ones ↗

Can We Trust LLMs? Mitigate Overconfidence Bias through Knowledge Transfer ↗

PRCA: Fitting Black-Box LLMs for Retrieval QA via a Pluggable Reward-Driven Contextual Adapter ↗

Work Experience

Education