• Home
  • News
  • Research
  • Publication
  • Team
  • Teaching
  • Contact
    • Can We Stop Malicious AI? KILLBENCH: A Benchmark for External AI Kill Switch Feasibility
    • The Illusion of Rust Safety: Detecting Modular Unsafe Functions with LLMs
    • The Interplay of Harness Design and Post-Training in LLM Agents
    • LLM Watermark Evasion via Bias Inversion
    • Towards Functional Correctness of Large Code Models with Selective Generation
    • Ruby: Unmasking Unsafe Rust in Stripped Binaries via Machine Learning
    • Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization
    • ChronoBias: A Benchmark for Evaluating Time-conditional Group Bias in the Time-sensitive Knowledge of Large Language Models
    • Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
    • Retrieval-Augmented Generation with Estimation of Source Reliability
    • ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System
    • Online Conformal Abstention for Factuality Control Under Adversarial Bandit Feedback
    • SOUNDBOOST: Effective RCA and Attack Detection for UAV via Acoustic Side-Channel
    • Selective Generation for Controllable Language Models
    • MedBN: Robust Test Time Adaptation against Malicious Test Samples
    • TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction
    • PAC Prediction Sets Under Label Shift
    • Angelic Patches for Improving Third-Party Object Detector Performance
    • ACon$^2$: Adaptive Conformal Consensus for Provable Blockchain Oracles
    • CODiT: Conformal Out-of-Distribution Detection in Time-Series Data for Cyber-Physical Systems
    • PAC Prediction Sets for Meta-Learning
    • Sequential Covariate Shift Detection Using Classifier Two-Sample Tests
    • Towards PAC Multi-Object Detection and Tracking
    • PAC Prediction Sets Under Covariate Shift
    • PAC Confidence Predictions for Deep Neural Network Classifiers
    • iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection
    • Calibrated prediction with covariate shift via unsupervised domain adaptation
    • PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction
    • Resilient linear classification: an approach to deal with attacks on training data
    • Integrated intelligence for human-robot teams
    • Abnormal object detection by canonical scene-based contextual model
    • 🎉 Two ICML'26 Papers + One ICML'26 Workshop
    • 🎉 One ICLR'26 Paper
    • 🧑‍⚖️ Area Chair
    • 🎉 One NeurIPS'25 Workshop Paper
    • 🧑‍⚖️ Area Chair
    • 🎉 Two EMNLP Papers
    • 🏆 DARPA AIxCC Winner ($4M)!
    • 🏆 Best Paper Finalist
    • 🎉 One ICCV'25 Paper
    • 🎉 One DSN'25 Paper
    • 🏆 Graduate Fellowship
    • 🧑‍⚖️ Area Chair
    • 🏆 Best Paper Award
    • 🎉 2024 Summary
    • 🧑‍⚖️ Area Chair
    • 🎉 One NeurIPS'24 Paper (Spotlight)
    • 🎤 Invite Talk
    • 🎤 Breakout Session
    • 🏆 DARPA AIxCC Semifinals ($2M)
    • 🧑‍⚖️ Area Chair
    • 🎉 One NAACL'24 Paper
    • 🎉 One CVPR'24 Paper
    • 🎉 One ICLR'24 Paper
    • 🎉 Assistant Professor
    • 🏆 Best Paper Award
    • 📚 AI Security (2026 Spring)
    • 📚 Machine Learning (2025 Fall)
    • 📚 Trustworthy ML (2025 Spring)
    • 📚 Trustworthy ML (2024 Fall)
    • 📚 Discrete Mathematics (2024 Sring)
    • 📚 Trustworthy ML (2023 Fall)
    • Foundations of AI Alignment
    • Physical AI Safety
    • Red Teaming
  • Team

Can We Stop Malicious AI? KILLBENCH: A Benchmark for External AI Kill Switch Feasibility

Sechan Lee
,
Hyounghun Kim
,
Sangdon Park
2026
Last updated on Jun 16, 2026

The Illusion of Rust Safety: Detecting Modular Unsafe Functions with LLMs May 16, 2026 →