tl;dr: Connected scatterplots are sometimes used to show how two variables are related over time. In this article, I argue that alternatives like stacked line charts and indexed line charts are virtually always better choices since they can communicate the same insights as connected scatterplots but are much easier to read and less prone to misinterpretation.
The first connected scatterplot that I remember seeing was the one below, which was published in The New York Times in 2012:
At first, I assumed that this chart was a standard line chart, but then I noticed that the line looped back on itself at one point, which is impossible in a standard line chart. It took me a few moments to realize that it wasn’t a standard line chart; it was actually a scatterplot, with each point representing a point in time (in this case, a year). Each point was connected to the next point in time with a line segment, resulting in what I later discovered was called a “connected scatterplot.”
I was impressed by this clever and unusual way of showing the data, and it was satisfying when I figured out how to interpret it. Like solving a puzzle. Once I did figure it out, it drew me in because it required quite a bit of mental effort to actually read it.
In fact, it was kind of exhausting. I found that I couldn’t take in the whole chart at once, as I could with a standard line chart or regular scatterplot. It was hard to figure out what the shape of the line “meant.” In order to figure it out, I had to interpret each part of the line on its own. For example, even though I noticed the “loop” in the chart right away, I didn’t know what it meant right away, and I had to consciously reason it out, one line segment (i.e., one year) at a time:
“From 1977 to 1978, fatalities increased moderately, and miles driven increased slightly.”
“Then, fatalities increased slightly, and miles driven decreased slightly.”
“Then, fatalities decreased slightly, and miles driven decreased slightly.”
“Then, fatalities decreased moderately, and miles driven stayed almost the same.”
I felt kind of like a six-year-old, sounding out words one letter at a time rather than reading whole words and phrases at once. Because there were many line segments in this chart, it took me quite a while to “sound them all out,” and it was hard to keep all those mini-observations in mind and synthesize them into a larger mental picture of what the data in this chart meant.
To be honest, this chart made me feel kind of dumb, especially since lots of other people seemed to love it. Just as a child gets faster at reading text with practice, though, I figured that I’d get better at reading connected scatterplots with practice.
That was 11 years ago.
I’ve seen dozens of connected scatterplots since that time, and I still don’t find them much easier to read now than when I first encountered them. I understand how to read them; they just still require a lot more time and mental effort than other chart types, and I’m less confident in my interpretation of the underlying data. Invariably, I end up wishing that I could just see the data as either two line charts stacked on top of one another (so that time lines up in both)…
…or as an indexed line chart:
After making the indexed line chart above, I realized just how huge the increase in miles driven has been since 1950, which wasn’t as obvious to me in the connected scatterplot. What was also clearer to me in these charts was that, despite the huge increase in miles driven per capita, the auto fatality rate per capita generally declined. This led me to wonder what the “auto fatalities per mile driven” would be, so I calculated it:
Holy crap! From 1950 to 2011, fatalities per mile driven decreased by a factor of seven! That’s a stunning reduction and seems like a huge story within this data, but it wasn’t as obvious to me in the connected scatterplot; it was more obvious when I saw the data as stacked and indexed line charts.
Maybe it’s just me who finds connected scatterplots to be hard to read and not very informative, but it’s not 🙂. I know a number of other very experienced data viz folks who feel the same way. I also, however, know a number of very experienced data viz folks who find connected scatterplots to be very effective, and this has led to some fascinating and lively debates on #dataviz Twitter (now rebranded as ‘X.’) I wanted to better understand the arguments in favor of using connected scatterplots, so I collected the arguments that I’ve heard (or, at least, my understanding of those arguments) in the list below, along with some responses:
Arguments in favor of using connected scatterplots
The big one seems to be that…
“Connected scatterplots show patterns or insights that aren’t clear in simpler chart types.”
For a number of years, I assumed that that must be true. Why else would chart creators use connected scatterplots? At some point, though, I started looking for examples of specific patterns, relationships, or any kinds of insights at all that were clear in connected scatterplots but that weren’t clear in stacked or indexed line charts. No matter how hard I looked, though, I couldn’t find a single example of an insight or pattern that was clear in a connected scatterplot but that wasn’t also clear in a stacked or indexed line chart of the same data.
For example, all the insights in the annotations in the New York Times connected scatterplot above are very clear in an indexed line chart. To my eye, they’re actually quite a bit clearer (click the image below to see a larger version):
I’ve asked connected scatterplot fans for examples of insights that were clear in connected scatterplots but not as clear in stacked or indexed line charts—even made-up, hand-crafted, cherry-picked examples. People have sent me a number of connected scatterplots that they claimed were more informative than other chart types, but those examples generally weren’t accompanied by stacked or indexed line charts showing the same data. I’m not sure how you can claim that one chart type is more informative than another without actually comparing it to that other chart type.
For a few of the connected scatterplots that people sent me, I decided to track down the data and create stacked or indexed line chart versions myself to see if the insights that they pointed out in the connected scatterplot were as clear in those versions (click the image below to see a larger version):
Every time I did this, insights were always clearer in the stacked or indexed line chart versions (to my eye, anyway), including the specific insight(s) that the person who sent me the connected scatterplot claimed were clearer in the connected scatterplot.
“Connected scatterplots work well as long as they include annotations to tell the user what the chart means.”
I agree that including annotations to help readers understand a potentially unfamiliar chart type is a good practice. With connected scatterplots, though, annotations are almost essential even if the audience already knows how to read a connected scatterplot.
Annotations should just provide additional clarifications or key takeaways; they shouldn’t be necessary to explain the basic meaning of a chart to people who already know how to read that chart type. If even experienced chart readers require that level of handholding to understand the basic meaning of a chart, that’s not a good sign.
When the same data is shown as stacked or indexed line charts, the need for annotations goes way down since the reader will almost certainly figure out the basic meaning of the chart on their own, without needing to be explicitly told.
“Connected scatterplots get easier to read with practice.”
Well, maybe they get easier to read, but I don’t think they ever get easy to read. Even connected scatterplot fans readily admit that they require more cognitive effort than other chart types, even for experienced chart readers.
I think connected scatterplots overload our brains in ways that other chart types don’t and in ways that can’t be overcome with practice (more on that in a moment), which is why even experienced chart readers still find them considerably slower and more effortful to read than other chart types.
“Connected scatterplots work fine as long as one of the variables generally increases throughout the time span shown.”
In the New York Times connected scatterplot above, the “Miles Driven per Capita” variable generally increases throughout the time span shown, so the line generally moves from left to right, only “going backward” or doubling back on itself occasionally. If both variables in the chart had experienced multiple increases and decreases during the time span shown, however, the connected scatterplot would have looked something like this:
Fans of connected scatterplots readily admit that connected scatterplots don’t work well if both variables are volatile (experience multiple increases and decreases over time). I have two concerns with this admission, though:
- Most time series data does contain multiple increases and decreases, which means that connected scatterplots look like spaghetti in most situations.
- A more serious concern is that I suspect the reason why connected scatterplots are easier to interpret when one variable generally increases over time is because the chart then resembles a standard line chart, so readers’ brains unconsciously interpret the connected scatterplot as a standard line chart, even if they consciously know that it’s not. Problem is, of course, that a connected scatterplot isn’t a standard line chart, and interpreting it as one virtually guarantees that the data in the chart will be misinterpreted (I’ll provide some examples of that in a bit).
“Connected scatterplots are more unusual-looking, so they’re more likely to get readers’ attention than standard line charts.”
That’s undoubtedly true, but the question is, at what cost does that attention come?
Participants in my training workshops tend to have higher-than-average levels of data literacy and, when I ask them if they know how to read a regular scatterplot, typically, about a quarter of them raise their hands. What does that suggest about the fraction of readers with average data literacy levels that will be able to figure out how to read a connected scatterplot on their own?
My guess is that only a small fraction of New York Times readers understood the miles/fatalities connected scatterplot, and an even smaller fraction interpreted it correctly. It might have gotten their attention, but I suspect that most readers relied almost entirely on the annotations to understand what the chart actually meant, or they misinterpreted it as a standard line chart.
If the cost of getting readers’ attention is that they don’t understand what they’re looking at, have to work unnecessarily hard to figure it out, or misinterpret what they’re looking at once you have their attention, that doesn’t seem like a great strategy to me.
“Maybe connected scatterplots require more cognitive effort to read, but that serves to draw the reader in.”
IMHO, trying to make a chart more engaging by making it unnecessarily hard to read is like trying to make an article more engaging by writing it in Pig Latin. I’m not saying that it never works, but I’d think long and hard about forcing readers to work harder than they need to in order to get insights from a chart.
I think there are more reliable ways to make charts more engaging, such as visually highlighting the most important element(s) in a chart, adding comparison/reference values, using red to indicate “bad” values, including key takeaways in chart titles or annotations, and others that I cover in my Practical Charts course.
“We shouldn’t avoid using chart types just because they’re unusual or unfamiliar.”
I completely agree, and I teach unusual, potentially unfamiliar chart types like step charts and cycle plots in my workshops because there are certain types of insights and certain types of data that can’t be shown using simpler, more familiar chart types.
My concern with connected scatterplots is that the insights that they show virtually always can be shown using simpler, more familiar chart types. Why use an unusual or unfamiliar chart type when a simple, familiar one can be used to say the same thing?
“Every chart type has situations in which it’s the most effective choice.”
I also heard this argument after my assassination attempt on box plots was published but, frankly, I don’t get it. Chart types are just human inventions, and just because something has been invented doesn’t mean that there must automatically be situations in which it’s the best solution (more on that here). The graveyard of human inventions that were the most effective solution in exactly zero situations is well-populated…
Now that I’ve addressed common arguments for using connected scatterplots, how about some…
Arguments against using connected scatterplots
Most of the concerns that I have with connected scatterplots arise from two underlying concerns, the first one being that…
Connected scatterplots overload our brains.
Standard line charts are easy to read because time always goes in the same direction (usually, left to right) and always advances at the same speed (e.g., one month always equals one centimeter on the horizontal axis). The only thing that changes are the vertical positions of the points on the line, but everything else remains constant. This is easy for us to mentally keep track of, which is why we can usually interpret a standard line chart all at once, instantly noticing spikes, dips, cycles, overall trends, etc., without having to interpret the chart one line segment at a time.
As a reader interprets the line in a connected scatterplot, however, the vertical positions of the points are always changing (like in a standard line chart), but the horizontal positions are also always changing, which makes the line more “cognitively cumbersome” to interpret. On top of that, time is also constantly changing direction in the chart; sometimes time goes down, sometimes to the upper left, sometimes to the right, etc. On top of the top of that, time is constantly changing speed in the chart; sometimes it moves quickly (long line segments) and sometimes slowly (short line segments).
If you find it easy to keep track of these four constantly changing properties for each line segment in a connected scatterplot, you’re smarter than I am. I suspect that, for most readers, that’s too many things to keep track of simultaneously in working memory, which would explain why even experienced chart readers can’t interpret connected scatterplots as easily as other chart types. While readers might improve a little with practice, humans can’t expand their working memories with practice (well, not much, anyway), which means that, no matter how much you practice reading connected scatterplots, they’ll always be cognitively cumbersome.
Connected scatterplots don’t make “visual sense.”
In chart types like standard line charts and bar charts, longer shapes represent larger quantities. For example, in a standard line chart, a long line segment represents a large increase or decrease for a given time period. That makes “visual sense” to our brains and is one reason why we find those chart types easy to read.
In a connected scatterplot, however, a long line segment sometimes represents a large quantity (i.e., a big increase or decrease), but sometimes it doesn’t. For example, if a line segment is close to vertical or close to horizontal, it means that one of the variables experienced a large change, but the other variable only experienced a small change, or didn’t change at all. That doesn’t make “visual sense” to our brains: Small quantities should be represented by short shapes, not long ones, and this forces the chart reader to constantly override their visual intuitions when reading a connected scatterplot, increasing the time and mental effort required to read it.
I think there are other ways that connected scatterplots violate our visual intuitions, but let’s leave it at that for now.
These last two concerns with connected scatterplots are the root causes of most of the other concerns that I have with them, such as the fact that…
Connected scatterplots are prone to misrepresenting data.
Note that these examples aren’t cherry-picked. Most connected scatterplots that I see in the wild pose a high risk of misleading audiences; the examples above are just crafted to make that potential for deception more obvious. The underlying problem is that, as I mentioned earlier, readers have trouble mentally keeping track of the constantly changing direction and speed of time in a connected scatterplot, which makes most of them easy to misinterpret.
Connected scatterplots often hide patterns and insights that are very obvious in stacked or indexed line charts.
While I have yet to find a pattern or insight that was clearer in a connected scatterplot than in a stacked or indexed line chart, it’s very easy to find the opposite, that is, patterns and insights that are very obvious in stacked or indexed line charts, but that are difficult or impossible to notice in connected scatterplots, for example:
Again, these examples aren’t cherry-picked; many patterns that are very obvious in stacked or indexed line charts are easy to miss in connected scatterplots.
If connected scatterplots have all these downsides, why do some people use them?
Well, I don’t use connected scatterplots, so I can only speculate about why others might.
I suspect that, in some situations, chart creators use connected scatterplots simply because they didn’t try visualizing the data as a stacked or indexed line chart, so they didn’t realize how much clearer their insights would be in one of those chart types.
I also suspect that some chart creators use connected scatterplots because they allow chart creators to be a bit more creative. Creativity is very important when making charts and it should be encouraged, but creativity that gets in the reader’s way is a dicey proposition. “More creative” doesn’t always mean “more effective.”
In my more cynical moments, I wonder if, in some cases, a connected scatterplot was used to “show off.” After all, any monkey with Excel can create a standard line chart, but only those with a certain level of dataviz sophistication would think of showing data as a connected scatterplot. If that’s the reason why a chart creator chooses to use a connected scatterplot instead of a simpler chart type, they’re putting their own interests above those of the audience, which I don’t recommend. Basically, connected scatterplots are a “clever” chart type. Not very effective, IMHO, but clever. Like a Segway.
A note about other types of connected scatterplots
Now’s probably a good time to mention that the concerns that I’m raising only relate to what I call “covariation” connected scatterplots, that is, connected scatterplots in which the points are points in time, and that are intended to how two variables relate to one another over time. There are at least two other types of connected scatterplots, however, and I think that those work just fine.
The first type are connected scatterplots that show mathematical functions, and in which the points aren’t points in time or the line doesn’t have discrete “points” per se:
These “non-time” connected scatterplots don’t present the same cognitive challenges as “points in time” connected scatterplots because only the shape of the line matters in these charts. The line doesn’t have a “speed” or “direction” to mentally keep track of, so these charts don’t overload readers’ brains like covariation connected scatterplots do.
The second type of connected scatterplots that I think work fine are what I call “item-comparison scatterplots with trails,” for example:
If you’re not sure what the difference is between a covariation scatterplot and an item-comparison scatterplot, read this. Item-comparison scatterplots are considerably easier to read than covariation scatterplots for reasons that I don’t have the word count to get into in this article, but you can probably see for yourself that the item-comparison scatterplot above is easier to read than the covariation ones that we saw earlier.
So, what does this all mean when it comes to creating charts?
If you’re thinking of using a connected scatterplot to show how two variables relate to each other over time, I suggest trying to visualize the data as stacked and indexed line charts, and then asking yourself if the insights that you need to communicate are as clear (or clearer) in one of those other chart types.
If you want to make a stacked or indexed line chart more engaging, try using visual highlighting to draw attention to important chart elements, stating key takeaways in callouts, adding comparison values, etc.
Still think connected scatterplots are da bomb?
- Please also post a stacked and/or indexed line chart of the same data. If you post a connected scatterplot with no stacked or indexed line chart version for comparison, it’s hard to talk about whether a connected scatterplot would be more or less effective in that scenario.
- Please point out the specific insight(s) that you think are clearer in the connected scatterplot. If you just say that a connected scatterplot is “more informative” than other chart types without specifying exactly which insights are clearer in it, it will be hard for me to respond meaningfully.
As an independent educator and author, Nick Desbarats has taught data visualization and information dashboard design to thousands of professionals in over a dozen countries at organizations such as NASA, Bloomberg, The Central Bank of Tanzania, Visa, The United Nations, Yale University, Shopify, and the IRS, among many others. Nick is the first and only educator to be authorized by Stephen Few to teach his foundational data visualization and dashboard design courses, which he taught from 2014 until launching his own courses in 2019. His books, Practical Charts and Practical Dashboards, will be published in 2023 and 2024.
Information on Nick’s upcoming training workshops and books can be found at https://www.practicalreporting.com/