Confidence in LLMs is often studied through uncertainty estimation and calibration. We survey a complementary perspective: confidence as a control signal that governs system behavior. We organize confidence utilization across the LLM lifecycle: (i) training (data selection, loss weighting, self-training, and preference optimization); (ii) inference (candidate selection, adaptive computation, and confidence-guided contrastive decoding); and (iii) deployment (cost-aware routing and cascading, RAG control (retrieval triggering, context filtering, and parametric-retrieval arbitration), and risk-aware abstention and monitoring). We unify these techniques into a framework that turns confidence into system decisions, with implications for efficiency and reliability.
ICLR
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Shahriar Noroozizadeh†, Xiaobin Shen†, Jeremy Weiss, and George H. Chen
Accepted at International Conference on Learning Representations, Apr 2026
Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from Causal Survival Forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragmented and inconsistent. We introduce SurvHTE-Bench, the first comprehensive benchmark for HTE estimation with censored outcomes. The benchmark spans (i) a modular suite of synthetic datasets with known ground truth, systematically varying causal assumptions and survival dynamics, (ii) semi-synthetic datasets that pair real-world covariates with simulated treatments and outcomes, and (iii) real-world datasets from a twin study (with known ground truth) and from an HIV clinical trial. Across synthetic, semi-synthetic, and real-world settings, we provide the first rigorous comparison of survival HTE methods under diverse conditions and realistic assumption violations. SurvHTE-Bench establishes a foundation for fair, reproducible, and extensible evaluation of causal survival methods.
2025
ML4H
Deep Kernel Aalen-Johansen Estimator: An Interpretable and Flexible Neural Net Framework for Competing Risks
Stepwise Fine and Gray: Subject-Specific Variable Selection Shows When Hemodynamic Data Improves Prognostication of Comatose Post-Cardiac Arrest Patients
Xiaobin Shen, Jonathan Elmer, and George H. Chen
Proceedings of the 10th Machine Learning for Healthcare Conference, Aug 2025