New Publication: Offensive Language Detection of Austrian German Dialect

18.06.2024

Pia Pachinger and Anna Maria Planitzer, together with their co-authors from the TU Wien and University of Zurich (Janis Goldzycher, Wojciech Kusa, Allan Hanbury, and Julia Neidhardt) published their new paper “AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection“ in Findings of the Association for Computational Linguistics: ACL 2024.

Full Article.

Dataset and Code.

The paper investigates target-based Austrian German offensive language detection where model interpretability in toxicity detection greatly profits from token-level annotations. However, currently such annotations are only available in English. The authors introduce a dataset annotated for offensive language detection sourced from a news forum, notable for its incorporation of the Austrian German dialect. In addition to binary offensiveness classification, they also identify spans within each comment constituting vulgar language or representing targets of offensive statements. The authors evaluate fine-tuned language models as well as large language models in a zero- and few-shot fashion. The results indicate that while fine-tuned models excel in detecting linguistic peculiarities such as vulgar dialect, large language models demonstrate superior performance in detecting offensiveness in AustroTox. They also publish the dataset and code.

This work was supported by the Vienna Science and Technology Fund (WWTF) [10.47379/ICT20015].