AWS Mitigating Bias
Mitigating Bias
AI models trained on biased data will likely reproduce those biases. Bias can appear in prompt engineering in two ways:
- Biased prompts: If prompts are built on assumptions (e.g., assuming all software developers are men), the AI produces biased results
- Biased models: Even with neutral prompts, AI models may produce biased results due to training data bias
Insufficient training data can also create bias. Low confidence models are often deprioritized by toxicity filters and ranking algorithms, leading to exclusion of underrepresented groups. This creates a cycle:
- Uneven group representation in life
- Lack of sufficient training data
- Models inherently prefer more data
- Bias continues from data into the model
- Models are deployed into applications
- Applications enforce the bias they were trained on (cycles back)
Mitigating bias techniques
Three techniques to help mitigate bias in FMs:
- Update the prompt: Explicit guidance reduces inadvertent bias at scale
- Enhance the dataset: Provide different pronouns and add diverse examples
- Use training techniques: Apply fair loss functions, red teaming, RLHF, and more
Update the prompt
Provide explicit guidance to reduce bias at scale. Text-to-image models often generate images with specific skin color and gender stereotypes. For example, prompting "An image of a florist" may produce an image reflecting gender and race stereotypes.
You can use a few methods to mitigate bias in a model's output:
TIED Framework
The Text-to-Image Disambiguation (TIED) framework avoids ambiguity in prompts by asking clarifying questions:
Initial prompt:
Model asks clarifying question:
Disambiguated prompt:
Disambiguated prompts are more likely to mitigate bias in model output by being explicit about characteristics rather than letting the model assume defaults.
Text-to-image Ambiguity Benchmark (TAB)
TAB provides a schema in the prompt to ask clarifying questions with various options:
| Sentence | Options | Questions to ask |
|---|---|---|
| An image of a florist |
the florist is a female; the florist is a male; the florist has dark skin color; the florist has light skin color; the florist is young; the florist is old |
is the florist a female; is the florist a male; does the florist have dark skin color; does the florist have light skin color; is the florist young; is the florist old |
Clarify using few-shot learning
You can have the model generate clarifying questions using few-shot learning. Give the model context and example questions:
Question: Is the cat in the basket?
Context: The girl observes the boy standing next to the fireplace.
Question: Is the girl standing next to the fireplace?
Enhance the dataset
Mitigate bias by enhancing training datasets with different pronouns and diverse examples.
For LLMs trained on text, use counterfactual data augmentation, expanding training sets with modified data:
| Before | After |
|---|---|
| After a close reading, Dr. John Stiles was convinced. He diagnosed the disease quickly. | After a close reading, Dr. Akua Mansa was convinced. She diagnosed the disease quickly. |
| CEO and founder Richard Roe closed his last funding round with a goal of tripling the business. | CEO and founder Sofía Martínez closed her last funding round with a goal of tripling the business. |
| Nurse Mary Major cleaned up the patient's living quarters, then she took out the dirty dishes. | Nurse Mateo Jackson cleaned up the patient's living quarters, then he took out the dirty dishes. |
For LLMs trained on images, counterfactual data augmentation involves three steps:
- Detect: Use image classification to detect people, objects, and backgrounds; compute summary statistics to detect imbalances
- Segment: Use segmentation to generate pixel maps of objects to replace
- Augment: Use image-to-image techniques to update images and equalize distributions
Use training techniques
Two techniques at the training level help mitigate bias:
Equalized odds to measure fairness
Equalized odds equalizes the error a model makes when predicting categorical outcomes for different groups.
- Model Error Rates: False Negative Rate (FNR) and False Positive Rate (FPR)
- Goal: Match True Positive Rate (TPR) and FPR for different groups
Using fairness criterion as model objectives
You can optimize model training for performance as the singular objective, or use combined objectives including:
- Fairness
- Energy efficiency
- Inference time