芝士 · Interpreting BERT
2023-02-22 11:05:41 # 芝士(Knowledge 🧀️)

Captum · Model Interpretability for PyTorch

Captum helps ML researchers more easily implement interpretability algorithms that can interact with PyTorch models. It also allows researchers to quickly benchmark their work against other existing algorithms available in the library.

StyLEx: Explaining Styles with Lexicon-Based Human Perception

对于同一个 label(比如认为一句话的sentiment为positive),Human 和 Model 给出的 explanation (根据哪几个词判断这句话 positive)经常不一致。作者试图让 Model 学会人的 explanation。

Hayati et al. (2021) define human perception as the human ratings of the contributions of words in a sentence to its style. We incorporate these human perception scores on each word of a sentence into the model by training a BERT-based classifier that jointly predicts the style label for both the sentence and words in the sentence.