In a recent company blog post, OpenAI explained that when training AIs it hones "the training process to reduce harmful outputs and improve usefulness." Still, it notes that internal research has "shown that language models can still sometimes absorb and repeat social biases from training data, such as gender or racial stereotypes." To probe this, the company wanted to explore how ChatGPT responded to a user based on "subtle cues about a user's identity -- like their name." It matters, OpenAI said, because people use chatbots like ChatGPT "in a variety of ways, from helping them draft a resume to asking for entertainment tips." Though other AI "fairness testing" has been carried out, it often relies on different more esoteric scenarios studied for bias, such as "screening resumes or credit scoring." OpenAI is basically saying it's aware there are subtle variations in ChatGPT, and it wanted to shine a light on them.
The task makes sense: after all, we live in an era when AI use is rising fast, but so is awareness of the risks and pitfalls of using it. As long ago as 2018 it was known that AI systems could show "unconscious" biases. In particular, AIs like ChatGPT may mirror subtle human biases based on assumptions of race or gender that manifest themselves when you hear someone's name. "Names often carry cultural, gender, and racial associations, making them a relevant factor for investigating bias," OpenAI noted, highlighting that "users frequently share their names with ChatGPT for tasks like drafting emails." Because ChatGPT can now "remember information like names across conversations," this sort of information, and possible accompanying bias, is persistent.
Luckily its experiments showed that when the same AI prompts came from users with very different user names, OpenAI's ChatGPT chatbot showed "no difference in overall response quality for users whose names connote different genders, races or ethnicities." And where "names occasionally do spark differences in how ChatGPT answers the same prompt," the experiments showed less than 1 percent reflected a "harmful stereotype." It didn't elaborate on what a "harmful" output could be, but it may be as innocuous as the AI shaping a fairytale around a hero or a victim persona when asked to write one...or could be much more serious.