10th June 2024
An unsolved issue in the domain of Natural Language Processing (NLP) is the perpetuation of stereotypical biases inherent in the training data. This has led to increased attention in the research community, but the focus has pre-dominantly been on English models, often neglecting models for other languages. This work aims to counter this trend by investigating bias in German word representations. This analysis includes representations that focus on the word itself, known as static word embeddings, and extends to contextualized embeddings that take into account the context provided by surrounding words. The German datasets for this research are partly derived from a workshop with experts from different fields, including human resources and machine learning, in Switzerland. The workshop aimed to identify language-specific biases relevant to the labour market. Our analysis shows that both static and contextualized German embeddings exhibit significant biases along several dimensions.