How to Distort Data

Far too many infographics are more about graphics than info. Visually interesting data is often prioritized over visually accurate data. As an extension to my earlier thoughts about the growth of infographics, I wanted to offer a little more scrutiny on how data points are being shown.

Let's take journalism as an analogy. Journalists hold themselves and their peers to a renowned standard of integrity and ethics. With this in mind, imagine you're reading a recap of a soccer match. In it, a player is quoted talking about the goalkeeper, saying...

"I thought he did a tremendous job for us out there. We were down 3-0 early and he shook it off. He was perfect for us the rest of the match and we were able to battle back for a huge win."

But later you heard an audio recording of his actual words, which were...

"I thought he did a great job for us out there. We were down 2-0 early and he shook it off. He was solid for us the rest of the match and we were able to battle back for a big win."

Though it sounds more dramatic, that first version is just wrong. He didn't say "tremendous." He said  "great," which isn't as strong. And "perfect" is clearly an adjective of higher degree than "solid." The basic message is the same in both versions, but one is true and one isn't. This just isn't done in their profession. But data visualizers, unintentional or otherwise, do this all the time. Here's how.

Nonlinear data scaling

One of the most common blunders is distorting a data point's relative magnitude by using a visual device that does not scale the way the designer thinks it does. Take a look at the example below.

All this needs to do is illustrate the percentages in true fashion. But it doesn't do that. This graphic makes it appear as though calls to "receive product or service" are seven times as frequent as "file complaint" calls (it's slice is about seven times larger). But that's not the case at all. The "receive product or service" calls are only about three times as frequent as reported by the data.

Not that bad, you say? Then have a look at this sham.

Wow, having any kind of accent at all seems to be the kiss of death when it comes to credibility. This old trick resets the baseline value as something other than zero. Here it's 6.3. Combined with the nonlinear scaling from above, this shenanigan really takes things out of whack.

Using proportional devices and showing true distance from zero shows us a much more accurate visual interpretation of reality.

Now it seems that callers who check their order status or file a complaint aren't so insignificant anymore. And those accents? Not quite the detriment they once seemed.

The distortion factor is my calculation based on Edward Tufte's "lie factor" as he describes in his book, The Visual Display of Quantitative Information, 2nd edition. A factor of 3.0 means the visual representation of the data appears three times larger (or smaller) than the real data itself.

Even Jess3 is not immune to data distortion. Here's their display of the the ILO convention average score for policy and practice (by continent). It serves as an example of what happens when data is pinned to a circle's diameter instead of its area.

This graphic is comparing law to reality. The inner circle is the policy score, the outer circle is the practice score. The greater the size disparity between them, the more the law doesn't reflect reality. Looking at Africa gives us a quick accuracy gauge. If the inner policy circle represents one unit, how many of those do you think it takes to fill up the practice circle? I'll save you the arithmetic — it's about 14. Way more than 3.7. The way it's presented here makes the policy and practice look impossibly distant.

The Americas don't fare all that much better (2.3 and 5.8). Here's how the real data compares with the distortion in both bar and circular graph form.

The large (5.8) circle consumes about 6,300 pixels per data unit. Applying that same scale to the smaller (2.3) circle results in a much different, and truer, result.

Reflection not invention

Data should be reflected, not invented. And by reflected, I don't mean from a fun house mirror. In this last example from, the size of the pie charts correctly reflect the value they represent with no distortion.

The circles grow in area at approximately 3,500 pixels per $1 billion. The image still scores high on visual appeal. But it stays virtuous. I can clearly see that, compared to other forms of energy, the federal government is heavily subsidizing fossil fuels to keep them cheap for us to consume.

If you're creating infographics, show the info as it is. Be pedantic. Care about accuracy. If something has a value of 5.1, show it as 5.1. Labeling it is an ancillary detail compared to the power of the data point's visual manifestation. We process shapes before text. The data points should never be thought of as approximations — 43.7 doesn't mean "a lot bigger than" 4.1. It's 10.66 times the size of 4.1. Nothing more, nothing less.

Certainly not all areas of visual communication need to practice such a purist approach. But when a designer is adding a presentation layer to a set of data, are they not obligated to show the data with precision? Should the standard really be any less than that of journalism?