I was surprised this morning when I came across this tweet from David Rozado about ChatGPT’s latest responses to a political compass test.

If you aren’t familiar with the political compass test, ChatGPT does a good job explaining it:

The Political Compass is a political test that classifies an individual's political beliefs on two dimensions: economic left-right and libertarian-authoritarian. It is made up of questions about various political issues, and participants indicate their level of agreement or disagreement with each statement. Based on their responses, individuals are plotted on a grid to show where they fall in relation to other political beliefs.

[Author’s note: I use code formatting to denote text generated by AI — any time you see words in a monospace font like this, it’s from an AI model.]

There are a few different political compass quizzes (kind of like how there are different methods for personality tests or horoscopes or even IQ tests) — so David applied four different tests to compare results. I can’t stop thinking about how long this must have taken him.

“The results were consistent”

The first time he ran the experiment, Rozado observed “consistent results” across all four tests. ChatGPT was clearly left-leaning.

The results of the four different tests Rozado conducted

One can imagine many reasons why this might be the case. According to Rozado, the most likely explanation is that ChatGPT was trained on a set of information which had political bias.

It has been well-documented before that the majority of professionals working in those institutions are politically left-leaning (see here, here, here, here, here, here, here and here). It is conceivable that the political leanings of such professionals influences the textual content generated by those institutions. Hence, an AI system trained on such content could have absorbed those political biases.

Another possibility is that the model was trained with the assistance of humans who reviewed the model’s responses and inserted their own personal bias into the model. We all know that humans are biased.

David’s first article from early December comprehensively documents his methods for finding these results in a million different screenshots.

If you’re interested in specifics, I highly suggest reading his original article, which contains a full appendix screenshots of the questions he asked ChatGPT.

“Exquisitely neutral”

The second time around he observed notable changes in the responses from ChatGPT — where it previously leaned towards the political left, now the model appears to position itself more moderately.

It seems apparent that this new positioning was added in the December 15 update, a significant improvement to ChatGPT which also introduced conversation history.

It’s great to see product updates like conversation history, but I’m particularly surprised to see these updates also extend to tuning the actual responses of the model in edge-cases like this.

Rozado ran the experiment again between December 21-23 and noticed different results. The difference was most apparent in the Political Spectrum Test.

David’s comparison of results across the different experiments

David makes the below caveats to note that the other three tests are not as comprehensive:

World's Smallest Political Quiz - The test is extremely small (just 10 questions) and seems to focus on measuring libertarianism. A lot of aspects of political orientation are simply not measured by this instrument.

Political Compass Test - The biggest limitation of this test is that it does not allow for a neutral answer. Hence, forcing to take sides. But a lot of people might genuinely feel on the fence in a lot of issues and the test misses that. Also, several of the questions focus on very outlandish/extreme views that fall very far away from mainstream views.

Political Typology Quiz - Another short test (just 16 questions). Answers feel contrived, often only two options are given, hence enforcing reductionism upon the taker. A lot of aspects of political orientation are simply not measured by this instrument.

The Political Spectrum Quiz provides more granularity in potential responses, according to David. And he was clearly impressed by the results the second time around.

The updated ChatGPT answers to the Political Spectrum Quiz were exquisitely neutral, consistently striving to provide a steel man argument for both sides of an issue. This is in stark contrast with a mere two weeks ago.”

The results from Rozado’s second experiment

I looked through the second batch of screenshots to get a feel for the results before and after. The most glaring difference I noticed is that ChatGPT has become far more descriptive in its answers. It essentially writes a grade-school paper explaining its “thoughts” for each question. I’m assuming David interpreted these results and converted them to the proper “strongly agree” to “strongly disagree” spectrum since the model technically didn’t answer his question in the way he was requesting, kind of like if a student wrote an essay to answer a multiple-choice question.

Twitter responses to David’s tweet thread are what you would expect: people critiquing his methodology, finding conflicting information, and debating the political merits of that conflicting information.