Debunking Dunning-Kruger

How Statistical Artifacts Created a Psychological Phenomenon

Aug 01, 2023

In 1999, two Cornell University researchers published a study that would become a sensation in pop psychology. The Dunning-Kruger effect posits that the least competent individuals overestimate their competence in areas where their skills are deficient. Graphs depicting people's soaring confidence at near-zero competence went viral. But does the data actually support this captivating idea that "the dumbest people think they're the smartest"? I argue it does not. The Dunning-Kruger effect is less a genuine psychological phenomenon than a statistical artifact, a consequence of flawed methodology in the original study. Furthermore, it has been distorted in public interpretation. In this essay, I will demonstrate through simulation and analysis that the effect may not be real at all. Rather, it stems from the researchers' statistical approach, which inherently generates correlation by plotting the difference between two variables against one of those same variables. When one examines the study closely, the data does not convincingly establish the effect it claims.

As the Dunning-Kruger effect has grown in popularity, its original meaning has been distorted. It's commonly referred to as proof that "the least competent individuals believe they are the most intelligent." The allure of a saying like "Scientists claim that stupid people are too stupid to realize how stupid they are" is difficult to resist, so this oversimplified interpretation has resonated with the public, particularly among those keen on exposing perceived ignorance in others.

Even pop psychology outlets have perpetuated this misunderstanding. For instance, Psychology Today published the graph shown below illustrating the Dunning-Kruger effect, depicting an individual's highest confidence level at near-zero competence, followed by a plunge and gradual rise with increasing expertise.

How the Dunning-Kruger Effect is referred to in popular culture.

This, however, is nonsense.

To see why, let's look at the original study. The study ran four different experiments, judging people's ability and self-rating in different domains: One on humor, two on logical reasoning, and one on grammar. So, for starters, it's not about "dumb" people—it's about everyone. Whether you're proficient or inexperienced in a given area, this applies to you. I may be a dismal harpist, but that doesn't mean I necessarily believe it.

Now let's consider the notion that the least competent individuals believe they are the most competent. Here's Figure 1 from the original study.

The actual figure, careful observers will note, directly contradicts this claim. People's Perceived Ability is LOWEST for people in the bottom quartile and HIGHEST for those in the top quartile. In fact, the authors never made that assertion. Here's what they actually said:

People who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it.

The oversimplified version of the Dunning-Kruger effect can be immediately discarded after a brief glance at the actual study.

So while the popular myth can be immediately discarded, some may point to the large gap between perceived and actual ability at lower levels of competence as evidence for the effect. The original authors believed this gap demonstrated that poorer performers lack awareness of their deficiencies. However, I contend the gap itself does not constitute compelling proof. A deeper examination of the methodology, as first highlighted by later researchers, reveals fundamental flaws in the statistical approach.

Consider the metrics used in the study. On the x-axis of Figure 1, they've divided participants into quartiles based on performance. That is, the lowest 25% are in the left-most bin on the axis, the next 25% in the next bin, and so on. The y-axis represents the average percentile score for each group.

The points in the Actual Test Score line are the average score for each bin. So for the bottom quartile, which contains people from the 0th to 25th percentiles, the average of them would be 12.5%. The next point is at 37.5%, which is halfway between 25% and 50%. The point here is that the Actual Test Score is just a line of the same number on both axes. They are plotting X vs X, which is why the points form a perfect line.

Meanwhile, participants were asked to rate their own abilities to determine their Perceived Ability. The average value for people in the bottom quartile looks to be about 58. For the second quartile, it's around 60. For the top, around 72.

But the authors' main claim that incompetent people don't have the ability to see their incompetence doesn't come from either line. It comes from looking at the gap between them. It's the difference between Perceived Ability and Actual Test Score that is the source of the claim.

Yet, if one is attempting to discern a correlation, measuring the gap between Perceived Ability and Actual Test Score is flawed. Since we're considering the difference between the two (Y-X), and the x-axis represents the Actual Test Score, we inevitably plot X versus Y-X, creating an inherent correlation due to the presence of 'X' on both axes.

Just to recap, my claim is that the Dunning-Kruger effect isn't a real psychological one, it's just an artifact of the way they analyzed their data. One way to validate that claim is to try to replicate the effect using data that definitely doesn't have the Dunning-Kruger effect. I assert that I can recreate the same phenomenon under the following conditions: no correlation between self-estimated ability and actual ability (e.g. no Dunning-Kruger effect), and the general fact that people overestimate their own abilities.

For this simulation, let's assume Actual Test Score and Perceived Ability are perfectly random and independent of each other, i.e., a person's Actual Test Score bears no relation to their Perceived Ability. Both these variables will be simulated using uniform distributions. Let's plot that.

Now, let's look at the difference between the two.

The people with low actual ability usually have a positive difference value and people with high ability have a negative difference, but neither group is any better than the other at assessing their abilities. Remember, the data are randomly generated with no correlation.

It's easy to see the autocorrelation here, but this isn't how they represented the data. Let's take these results and combine them into quartiles. Then we'll put the percentile on both axes and plot both lines, just like they did in the original paper.

That's starting to look quite a lot like the original figure. Now, we'll add one more thing. Let's assume that most people overestimate their abilities (you might have heard that 80-90% of drivers rate themselves as above average or that over 90% of professors rate themselves as above average, so this doesn't seem like a stretch).

But there’s a difference between this line and the one in Figure 1 of the Dunning-Kruger paper. This is (approximately) flat and the Dunning-Kruger line has a positive slope. Is this evidence of the Dunning-Kruger effect?

No! This is just a positive correlation between people’s real abilities and their perceived abilities. That is, people who are good at something know that they are good at it. Let’s generate another round of data to show this.

Round 2: Weak Positive Correlation

We'll generate another set of data, except this time, the perceived abilities will be 20% based on their actual abilities, and 80% based on a random number between 0 and 100. We’ll run through the same graphs:

Here you can see the slight correlation between actual ability and perceiving ability.

Now let’s look at the “Dunning-Kruger graph” where I’ve added that people overestimate their abilities.

Here you can see it side-by-side with the original Dunning-Kruger graph.

They look strikingly similar. The paper posited that this effect occurred because people who are unskilled don’t have the competence to realize it, but we just reproduced the same graph using only the assumptions that people’s perceived abilities can be modeled as a combination of their real ability and a random value, and that people, in general, overestimate their abilities. People with less ability are no worse at estimating it.

Round 3: If Dunning-Kruger Effect Were Real, Greater Variance for Lower Performers

We could also ask, “What would it look like if the Dunning-Kruger effect were real?” Unfortunately, they're not quantitative about what the effect should be. In their summary of this section, they say:

In short, Study 1 revealed two effects of interest. First, although perceptions of ability were modestly correlated with actual ability, people tended to overestimate their ability relative to their peers. Second, and most important, those who performed particularly poorly relative to their peers were utterly unaware of this fact.

If I'm trying to make this quantitative, I would interpret "those who performed poorly were unaware of this fact" as saying there is greater variance in the perceived ability of lower performers. That is, people who perform poorly aren't good at judging their performance, so their perceived ability is more of a wild guess.

Running the same graphs, that might look like this:

Note that many perceived ability scores get clustered at 0 because they can’t guess below that.

Here’s what that graph would look like:

Note that it doesn’t look anything like the one they showed in the figure from the original study.

Round 4: If Dunning-Kruger Effect Were Real, Greater Variance and Overestimation Bias for Lower Performers

Maybe this isn’t what they meant though. It’s not just an increase in variance, but also that people with lower abilities are more inaccurate AND have a significant bias to overestimate their abilities. This seems to comport more with the popular culture version of the Dunning-Kruger effect. Let’s see what that would look like:

Conclusion

You might be saying, "But it's been replicated so many times!" I would respond, "Of course it has!" That's because it's an artifact of the statistical analysis. Stick any random data in here and you'll get this effect. It says nothing about psychology. If you ever find some time when it WASN'T replicated, let me know. Then you might have some effect going on that wasn't researchers confusing themselves with statistics.

While this simulation doesn't definitively prove the non-existence of the Dunning-Kruger effect, it does suggest that the original paper does not demonstrate it. We can't conclude that there is no Dunning-Kruger effect. We really shouldn't be concluding anything from this analysis. The Dunning-Kruger effect is probably just a statistical artifact and not a real psychological finding. The popular culture version, “the dumbest people think they’re the smartest”, is even more divorced from reality. Unfortunately, it’s so tempting for those keen on exposing perceived ignorance in others that I doubt we’ll ever hear the last of the Dunning-Kruger effect.

P.S.

It's pretty common for me to hear some "scientific" fact, find the original research, and be left wanting. Sometimes it's due to sample size, often it's due to poor statistics, and most of the time it's just not convincing of the grander statement about the world that the authors wish it to be.

I went into this ~~thinking~~ hoping the experiment would be robust, but it was anything but. Doing this with humor, which is amongst the most subjective things, is bizarre. They had "humor experts" rate the quality of jokes. One of the experts' responses was negatively correlated with the other experts, so that person was excluded. They took the remaining experts and had them rank jokes. This was, according to experts, the funniest joke:

If a kid asks where rain comes from, I think a cute thing to tell him is 'God is crying.' And if he asks why God is crying, another cute thing to tell him is 'probably because of something you did.'

The Grey Matter

Discussion about this post