AI Safety Compendium

❯

Teaching AI to Handle Exceptions: Supervised Fine tuning with Human aligned Judgment

Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment

27 Apr 20261 min read

Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment

Matthew DosSantos DiSorbo, Harang Ju, Sinan Aral — 2025-03-XX — Harvard Business School, Johns Hopkins University, MIT Sloan School of Management

Source

Link: https://arxiv.org/html/2503.02976v2#S3
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- human-inductive-biases — White-box safety (i.e. Interpretability)

Related Pages

human-inductive-biases

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity

AI Safety Compendium

Explorer

Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment

Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment

Source

Graph View

Graph view

Table of Contents