Foundations of AI Alignment

Large language models raise many practical issues, including hallucination and harmful responses. This raises the following question.

How do we build AI systems that are reliably truthful, safe, and secure?

Keywords

Large Language Models
Conformal Abstention
Conformal Prediction
Uncertainty Quantification
Reinforcement Learning

The Interplay of Harness Design and Post-Training in LLM Agents

Kyungmin Kim , Youngbin Choi , Seoyeon Lee , Suhyeon Jun , Dongwoo Kim , Sangdon Park

ICML RLxF Workshop 2026

arXiv

Towards Functional Correctness of Large Code Models with Selective Generation

Jaewoo Jeong , Taesoo Kim , Sangdon Park

ICML 2026

arXiv

Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

Junyoung Yang* , Kyungmin Kim* , Sangdon Park

ICLR 2026

arXiv

ChronoBias: A Benchmark for Evaluating Time-conditional Group Bias in the Time-sensitive Knowledge of Large Language Models

Kyungmin Kim , Youngbin Choi , Hyounghun Kim , Dongwoo Kim , Sangdon Park

EMNLP Findings 2025

DOI ACL

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

Saemi Moon* , Minjong Lee* , Sangdon Park , Dongwoo Kim

ICCV 2025

arXiv

Retrieval-Augmented Generation with Estimation of Source Reliability

Jeongyeon Hwang , Junyoung Park , Hyejin Park , Dongwoo Kim , Sangdon Park , Jungseul Ok

EMNLP 2025

OpenReview

Online Conformal Abstention for Factuality Control Under Adversarial Bandit Feedback

Minjae Lee* , Yoonjae Jung* , Sangdon Park

2025

🏆 Best Paper Finalist from CKAIA

arXiv

Selective Generation for Controllable Language Models

Minjae Lee* , Kyungmin Kim* , Taesoo Kim , Sangdon Park

NeurIPS 2024

🏆 Spotlight (Top 2.08%)🏆 POSTECH GSAI BK21 Best Paper Award

arXiv

MedBN: Robust Test Time Adaptation against Malicious Test Samples

Hyejin Park* , Jeongyeon Hwang* , Sunung Mun , Sangdon Park , Jungseul Ok

CVPR 2024

arXiv

TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction

Shuo Li , Sangdon Park , Insup Lee , Osbert Bastani

NAACL 2024

🏆 ICML'23 TEACH Workshop Best Paper Award

arXiv

PAC Prediction Sets Under Label Shift

Wenwen Si , Sangdon Park , Insup Lee , Edgar Dobriban , Osbert Bastani

ICLR 2024

arXiv

Angelic Patches for Improving Third-Party Object Detector Performance

Wenwen Si , Shuo Li , Sangdon Park , Insup Lee , Osbert Bastani

CVPR 2023

IEEE

ACon$^2$: Adaptive Conformal Consensus for Provable Blockchain Oracles

Sangdon Park , Osbert Bastani , Taesoo Kim

Security 2023

arXiv

PAC Prediction Sets for Meta-Learning

Sangdon Park , Edgar Dobriban , Insup Lee , Osbert Bastani

NeurIPS 2022

arXiv

Sequential Covariate Shift Detection Using Classifier Two-Sample Tests

Sooyong Jang , Sangdon Park , Insup Lee , Osbert Bastani

ICML 2022

PMLR PDF

Towards PAC Multi-Object Detection and Tracking

Shuo Li , Sangdon Park , Xiayan Ji , Insup Lee , Osbert Bastani

2022

arXiv

PAC Prediction Sets Under Covariate Shift

Sangdon Park , Edgar Dobriban , Insup Lee , Osbert Bastani

ICLR 2022

arXiv

PAC Confidence Predictions for Deep Neural Network Classifiers

Sangdon Park , Shuo Li , Insup Lee , Osbert Bastani

ICLR 2021

arXiv

iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection

Ramneet Kaur , Susmit Jha , Anirban Roy , Sangdon Park , Edgar Dobriban , Oleg Sokolsky , Insup Lee

AAAI 2021

arXiv

Calibrated prediction with covariate shift via unsupervised domain adaptation

Sangdon Park , Osbert Bastani , James Weimer , Insup Lee

AISTATS 2020

arXiv

PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction

Sangdon Park , Osbert Bastani , Nikolai Matni , Insup Lee

ICLR 2020

arXiv

Resilient linear classification: an approach to deal with attacks on training data

Sangdon Park , James Weimer , Insup Lee

ICCPS 2017

arXiv

Foundations of AI Alignment

Keywords

Related Publications