Data dressing gowns and statistical slippers
Line and bar charts are a data visualiser’s favourite dressing gown and slippers — a nice go-to. Everyone understands them. If done well, they can tell a story almost instantly, with no ambiguity. But, have you ever felt a little guilty about using them when they aren’t quite right for what you’re wanting to show? Did you use line and bar charts anyway, just because they’re so ubiquitous, comfortable and familiar? I’ll confess: I have and I did. Yet, after seeing a chart which circumvents these twinges of guilt in some FiveThirtyEight articles, I’m championing a new chart. Here’s an alternative option for you to consider when you find yourself reaching for the well-worn dressing gown of a line graph, or the warm, fuzzy slippers of a bar chart.
A matter of time
Take this line graph of share price.
It shows the share price of a company over a certain time period and, since share prices are tracked at such frequent intervals, whichever point you look at on the line represents an accurate value of the share price at that time. The line gives a clear sense of progression over time, so the trend is understood very quickly. It’s the perfect example of a line graph doing its job perfectly.
But, let’s consider this chart:
It shows the average runs per game for each baseball season. How does the choice of chart feel here? There’s a clear sense of time and the eye quickly follows the line to pick up key trends. However, there’s something that makes me feel uneasy. The first thing I ask myself when considering line graphs is, “If I try to read the value represented by the line in between two points, does it mean anything?” In this example, you might end up looking at the line connecting the points which represent the 2022 season and the 2023 season, and try to interpolate a value for the 2022.5 season. This, of course, doesn’t exist and doesn’t make any sense. A similar point here is that the average runs per game is relevant to the whole season it’s calculated for, so it feels wrong to condense that data to a singular point. Of course, if these points really concerned us enough, we could remedy them easily by removing our dressing gown and donning our slippers.
Showing the data as a bar chart clearly represents each season as a distinct entity and removes the urge to interpolate, but it loses a crucial benefit of the line graph: its sense of time. Because bars are visually separate, the data points don’t feel as connected, and, therefore, it’s not as intuitive to follow the trend established by bar length. With a line graph we can follow the line with ease, but we don’t get the same experience with bars.
Enter the step chart.
Although not very common, you may have seen them for trends of interest rates. Interest rates change at distinct points in time, so the jumps in value are accurate for showing these, and a continuous time axis allows the point of the change to be shown exactly. (Technically, this example is just a line graph with lots of uniformity.)
We can draw on this example, taking the idea of point-in-time jumps, and apply it to our baseball data.
Showing the data in this way not only removes our concerns about reducing a season to a single point, and removes the misleading slopes between dots, but it maintains the sense of time. In this example it makes perfect sense to chart the data in this way. The average runs per game for a season is applicable to the whole season so it’s logical to take the same value at every point at which that season is represented. Essentially, we have preserved the tops of the bars from our bar charts and joined them with vertical lines, but visually I feel this is better since our eyes follow the line and pick up trends more easily.
What’s more, step charts can also be our answer to our other dirty data secrets.
You’re barred
In my day job of reporting on employee experience data, I use ordinal variables a lot, most commonly some measure of seniority. This may be a job level or a pay grade, but typically there are about 5–10 values, sometimes represented by letters. In this example there are 6 values from A (the most junior employees) to F (the most senior employees), and we’re plotting the proportion of positive responses to a question for men and women.
Here, we’ve used bars, because the job level variable is ordinal, and, if we look at the data, we can see two key trends:
- For both men and women, positivity is stronger at more senior grades. That is, positivity for men increases with seniority, and positivity for women increases with seniority.
- There are larger differences between the bars at senior levels, indicating that the gender gap is larger at more senior grades.
However, these two trends can be seen more easily as a line chart.
Visually, I think this line chart is easier to interpret. There are two increasing trend lines which diverge, so the two key messages are instantly apparent. The increasing trend lines tell us that positivity is stronger at more senior grades, and the fact that the lines diverge tell us the gender gap is larger at more senior grades. In the bar chart example, these take longer to see, since you need to mentally consider both series of bars individually to see the increasing trend and there is more visual effort needed to see the gaps between each pair of bars in order to see the widening gender gaps. But, like the baseball example, there’s a nagging feeling that we shouldn’t use a line graph for this. There’s no such thing as a job level A.5, or B and three-quarters that the slopes allude to, so the chart feels imprecise. Is the gain in interpretability worth brushing this concern under the carpet?
Again, a step chart might be the answer.
Similarly to the previous example, this is the best of both worlds, the eye can pick up the trend of increasing diverging lines quickly and we’re not introducing any erroneous hinterlands between the ordinal values.
Drawing a line under it
While not commonly seen, step charts are a great way of drawing attention to certain elements of your data without taking some of the liberties we might otherwise have done. Also, in reports full of line graphs and bar charts, as something that looks a bit different, they can bring a bit of visual interest. I’m not saying they should be used everywhere as a panacea of datavis. For me, the central question should always be, “What visual will best convey what the data are saying?” and, as data visualisers, we should be seeking the answer to that question with every chart we create. Step charts can be one of the potential answers to this question and should be in our chart arsenal when making a choice. So, give them a thought next time, and maybe they can be an exciting silk kimono, the next time you reach for your dressing gown and slippers.