The paper investigates target-based Austrian German offensive language detection where model interpretability in toxicity detection greatly profits from token-level annotations. However, currently such annotations are only available in English. The authors introduce a dataset annotated for offensive language detection sourced from a news forum, notable for its incorporation of the Austrian German dialect. In addition to binary offensiveness classification, they also identify spans within each comment constituting vulgar language or representing targets of offensive statements. The authors evaluate fine-tuned language models as well as large language models in a zero- and few-shot fashion. The results indicate that while fine-tuned models excel in detecting linguistic peculiarities such as vulgar dialect, large language models demonstrate superior performance in detecting offensiveness in AustroTox. They also publish the dataset and code.
This work was supported by the Vienna Science and Technology Fund (WWTF) [10.47379/ICT20015].