Department of Psychology: Dr Tessa Charlesworth, "A word is characterized by the company it keeps": Using word embeddings to uncover social beliefs in child and adult language
January 13, 2020, 2:45 pm to 4:30 pm
Social beliefs are known to be reflected in the language of children and adults. But how can they be quantitatively studied to understand the relative strength of social beliefs across language from different sources (e.g., books vs. speech), different speakers (e.g., children vs. adults), and even different time periods (e.g., 1900 vs. present-day)?
Advances in machine learning (word embeddings) can transform large text corpora into vectors and newly quantify social beliefs in real-world natural language. In this project, we use word embeddings derived from 7 corpora (65+ million words) to provide the first comprehensive test of beliefs about gender in children’s and adults’ language, including speech, TV/movies, and books.
Gender beliefs associating male/female with well-studied concepts (e.g., work/home, science/arts) were consistently present across corpora. Moreover, gender beliefs associating male/female with 600+ traits and 300+ professions were pervasive, with 71% and 79% of traits/professions showing medium-to-large associations to gender. Descriptive differences by language sources, speaker age, and time period emerged as well.
Together, these results illustrate a novel methodological approach that can promote new theories of whether, when, and to what extent consequential social beliefs emerge in children’s and adult’s real-world linguistic environments.