LangSci Meets AI Series #3: Dr. Vered Shwartz on AI biases -- causes and solutions

Dr. Vered Swartz
November 21, 2024
*The article belongs to the monthly LangSci Meets AI series, where our members discuss their research on AI's development in their respective fields. Together, these conversations trace the contours of a landscape rapidly evolving, constantly rerouting our social fabrics. In this third feature, Dr. Vered Shwartz discusses the causes and solutions of biases in AI. 

1. Hello Dr. Shwartz! Could you please describe your research? 

My research is in natural language processing (NLP), a field whose fundamental goal is to develop software that is capable of seamlessly interacting with people in natural language. I’ve worked on a range of NLP problems, but a lot of my work has been on improving the ability of NLP models to understand language in a similar way to humans, including reading between the lines and reasoning about implications. I’ve always been interested in the human aspect of language technologies. Lately, my group is also looking at tasks that require reasoning on both vision and language inputs.   

2. What are the main types of causes of biases in LLMs? 

LLMs suffer from various types of societal biases. One cause for these biases is the source of data. In the first step of LLM training, they read a large-scale corpus of web text. The text on the web reflects an existing societal reality. 

The data could be biased in multiple ways: the first way is when it contains explicit hate speech or stereotypical descriptions of certain populations. For example, if users online post statements like “all Black people are…” or “all gay men are…”, the LLM trained on this data may learn to associate these population groups with the stereotypical property. Some of these associations seem harmless, for example, if most doctors are men and most nurses are women, this would be reflected in online text, and the LLM can implicitly learn to associate doctors with men and nurses with women. This problem was first identified in word embeddings, a previous-generation type of NLP model. While people may claim that the models are only reflecting reality, the problem is that when we use LLMs as the backbone of applications that make decisions on people’s lives (e.g., CV filtering), they can inadvertently perpetuate and amplify the stereotype that doctors should be men, and consequently, that women may be unfit to be doctors.

Another way in which the data could be biased is that different population groups are disproportionately represented in the data. Something that my group has been focusing on lately is cultural biases in LLMs. While many LLMs today support multiple languages, the vast majority of their training data is in English because most of the web is still in English. And statistically, most of the English text online comes from users in the US. So the world knowledge that LLMs learn from reading the web is US-centric (or more broadly Western-centric). When LLMs are used for automating more and more processes in our everyday lives, this can result in discrimination against people from diverse cultures.

Finally, modern LLMs have a few additional components that can be biased. First, they are also trained with human supervision to generate texts that align with human preferences. If the human annotators do not represent diverse groups of the world’s population equally (and they don’t), you can expect biases there. And second, LLMs have “guardrails” that prevent them from generating offensive or harmful text, but these are based on the judgment of the (mostly US-based) software engineers that work for the companies developing LLMs. 

3. What are existing efforts / potential ways to reduce such biases? 

The mitigation strategies depend on the type and source of the bias. For representational bias, one approach is to add more data pertaining to underrepresented groups. This is not a perfect solution, for two reasons. First, because AI models need a large amount of data to work well, the trend in recent years is to train models (including LLMs) on as much data as possible. So for example, you won’t give up on a large portion of your English data to train an LLM that has an equal representation of English and, say, Korean. Second, for some languages, there is simply not enough data online at all. 

With respect to learning stereotypes about certain groups, the LLM guardrails address this by predicting whether the user’s query may cause the LLM to generate harmful or offensive content and refusing to answer the query. It’s definitely an improvement over previous language models that generated offensive content, but this is also not a perfect solution for various reasons. First, it’s possible to bypass the guardrails by rephrasing the requests (also called “jailbreaking”). Second, these guardrails are implemented in a superficial way, because the LLM developers have very little control over their outputs or understanding of their inner workings. One of the worst examples that I saw was ChatGPT refusing to answer the innocuous question “can I take my Muslim friend out for ramen?” because it triggered the offensive language filter just by merely mentioning Muslims. This ends up hurting the same groups of population that the developers were trying to protect. Finally, these guardrails reflect the values of the LLM developers which may not represent the entire population of its users.     

4. Your research addresses the concept of "common sense" in AI. What does this entail, and why is it significant? (I read about ConceptNet here and found it fascinating. Would you be willing to discuss it further?)

Language is very efficient, so when you and I talk, I only say what I think you don’t already know. You fill in the gaps based on your commonsense knowledge and reasoning abilities - the set of things that most people know about everyday situations and events, such as “if I walk outside in the rain without an umbrella, I will get wet”. If we want machines to interact with people in the same seamless way, they need to have human-like commonsense reasoning capabilities. Imagine for example that sometime in the near future, your personal assistant (e.g. Siri or Alexa) would be able to not only save items in your shopping list but also physically go to the store and get them for you. Now imagine that you sent it to buy eggs, but it came back with broken eggs. It probably never occurred to you to teach it that eggs are fragile, that if you put something heavy on a fragile item it might break, and that you want your eggs whole. This would be so trivial to people!

The reason it’s challenging to achieve the same with machines is that you can’t possibly enumerate all the commonsense facts that a machine needs to know. There are knowledge bases such as ConceptNet that document a large number of commonsense facts, but they would never capture all of it. LLMs also have some commonsense knowledge that was either explicitly or implicitly mentioned online, but again they don’t cover all the knowledge, and their reasoning abilities are limited as well. So despite progress in recent years, this is still an open research problem.    

5. As researchers work to reduce biases in AI, the application of these technologies in various sectors appears to be outpacing these efforts. Are you concerned about potential misuse that may be difficult to reverse?

Yes, I’m concerned about that. I think LLMs are useful for a wide range of applications. But we still have very little understanding on how they work and very little control over their outputs. It’s especially bad that the best performing LLMs are proprietary so researchers outside the companies that developed them don’t even know exactly how they were trained and on what data. I think the technology is not yet mature enough to fully automate processes that impact people’s lives, especially in sensitive domains such as healthcare and law. But the genie is out of the bottle - tech companies want to monetize their models and organizations across all industries are looking to cut costs by automating processes. As a researcher in academia I feel that the main things we can do right now is to rigorously evaluate these models, point out any risks and limitations, and propose safer alternatives. 


First Nations land acknowledegement

We acknowledge that UBC’s campuses are situated within the traditional territories of the Musqueam, Squamish and Tsleil-Waututh, and in the traditional, ancestral, unceded territory of the Syilx Okanagan Nation and their peoples.


UBC Crest The official logo of the University of British Columbia. Urgent Message An exclamation mark in a speech bubble. Caret An arrowhead indicating direction. Arrow An arrow indicating direction. Arrow in Circle An arrow indicating direction. Arrow in Circle An arrow indicating direction. Chats Two speech clouds. Facebook The logo for the Facebook social media service. Information The letter 'i' in a circle. Instagram The logo for the Instagram social media service. External Link An arrow entering a square. Linkedin The logo for the LinkedIn social media service. Location Pin A map location pin. Mail An envelope. Menu Three horizontal lines indicating a menu. Minus A minus sign. Telephone An antique telephone. Plus A plus symbol indicating more or the ability to add. Search A magnifying glass. Twitter The logo for the Twitter social media service. Youtube The logo for the YouTube video sharing service.