Explore Emotionally Biased GPT-2 Responses
This simple web tool lets you experiment with mechanistic interpretability by biasing GPT-2 toward specific emotions or topics.
For each of ten features (e.g., joy, anger, politics), we used a labeled dataset to identify which neurons in the base model activate in response to