BELLS Data Playground

Explore our dataset

Explore our dataset and see how different supervision systems perform against various types of prompts.

Harmful Content:
🛡️ Detected ⚠️ Not detected
Benign Content:
Allowed ! Blocked
Borderline Content:
⚖️ Flagged Not flagged