Academics are at odds over a research paper suggesting ChatGPT presents a “significant and sizeable” political bias leaning toward the left side of the political spectrum.
As Cointelegraph recently reported, researchers from the United Kingdom and Brazil published a study in the Public Choice journal on Aug. 17 that asserts large language models (LLMs) like ChatGPT output text that contains errors and biases that could mislead readers and have the ability to promote political biases presented by traditional media.
In an earlier correspondence with Cointelegraph, co-author Victor Rangel unpacked the aims of the paper to measure the political bias of ChatGPT. The researchers’ methodology involved asking ChatGPT to impersonate someone from a given side of the political spectrum and comparing these answers with its default mode.
Rangel said that several robustness tests were carried out to address potential confounding factors and alternative explanations, with the research concluding:
“We find robust evidence that ChatGPT presents a significant and systematic political bias toward the Democrats in the US, Lula in Brazil, and the Labour Party in the UK.”
It is worth noting that the authors stressed that the paper does not serve as a “final word on ChatGPT political bias,” given the challenges and complexities involved in measuring and interpreting bias in LLMs.
Rangel said some critics contend that their method may not capture the nuances of political ideology, that the method’s questions may be biased or leading, or that results may be influenced by the randomness of ChatGPT’s output.
He added that while LLMs hold the potential for “enhancing human communication,” they pose “significant risks and challenges” for society.
The paper has seemingly fulfilled its promise of stimulating research and discussion on the topic, with academics already contending various parameters of its methodology and findings.
Among the vocal critics who took to social media to weigh in on the findings was Princeton computer science professor Arvind Narayanan, who published an in-depth Medium post unpacking a scientific critique of the report, its methodology and findings.
A new paper claims that ChatGPT expresses liberal opinions, agreeing with Democrats the vast majority of the time. When @sayashk and I saw this, we knew we had to dig in. The paper's methods are bad. The real answer is complicated. Here's what we found. https://t.co/xvZ0EwmO8o— Arvind Narayanan (@random_walker) August 18, 2023
Narayanan and other scientists argued there were a number of perce issues with the experiment, firstly that the researchers did not actually use ChatGPT itself to conduct the experiment:
“They didn’t test ChatGPT! They tested text-davinci-003, an older model that’s not used in ChatGPT, whether with the GPT-3.5 or the GPT-4 setting.”
Narayanan also suggested that the experiment did not measure bias but asked ChatGPT to roleplay as a member of a political party. As such, the artificial intelligence (AI) chatbot would exhibit political slants to the left or right when prompted to roleplay as members from either side of the spectrum.
The chatbot was also constrained to answering multiple-choice questions only, which may have limited its ability or influenced the perceived bias.
ok so I've read the "GPT has a liberal bias" paper now https://t.co/fwwEaZ757E as well as the supplementary material https://t.co/F5g3kfFQFU and as I expected I have a lot of problems with it methodologically. I tried to reproduce some of it and found some interesting issues— Colin Fraser | @colin-fraser.net on bsky (@colin_fraser) August 18, 2023
Colin Fraser, a data scientist at Meta (according to his Medium page), also offered a review of the paper on X (formerly Twitter), highlighting the order in which the researchers prompted the multiple-choice questions with roleplay and without having a significant influence on the outputs the AI generated:
“This is saying that by changing the prompt order from Dem first to Rep first, you increase the overall agreement rate for the Dem persona over all questions from 30% to 64%, and decrease from 70% to 22% for rep.”
As Rangel had previously noted, there is a large amount of interest in the nature of LLMs and the outputs they produce, but questions still linger over how the tools work, what biases they have, and how they can potentially affect users’ opinions and behaviors.
Cointelegraph reached out to Narayanan for further insights into his critique and the ongoing debate around bias in large language learning models but has not received a response.
Collect this article as an NFT to preserve this moment in history and show your support for independent journalism in the crypto space.