S

Step 9 in the Data Exploration Journey: Chart Choices

This article is part 10 in a series on data exploration. I began this series while serving as the Director of Education for the Data Visualization Society in 2022, because so many people were asking to hear more about data exploration and the process of learning data vis. A list of previous entries can be found at the end of the article. What began as an exploratory project on the “State of the Industry Survey” data grew into a 1.5 year project that produced a 30-page 2023 “Career Portraits” publication (DVS member login required). This series gives an inside view of the project, illustrates my process for approaching a big project, and demonstrates that no “expert” is immune from the challenges and setbacks of learning. Let’s see where this journey takes us!

The last article found us transitioning out of the discovery diamond for the Career Portraits report and into the slow, upward climb of the Build process. In Step 8, we were reworking our data and thinking through the layout and constraints for our final deliverable. As part of that, we needed to think through the data visualization more carefully to decide on a final form for each chart that we included in the report.

Diamond-shaped flowchart illustrating project stages from "Expand/Ideate" to "Deliver/Deploy," highlighting points of maximum risk of overwhelm and exhaustion.

For this article, I’m going to focus on the refinement choices that we made for the visualizations as we moved from early exploration into the build phase for the Career Portraits project. Both the purpose and the audience for our charts shifted during that transition, and it was important to think carefully about what we needed the charts to do.

In the Expand phase, charts support data exploration by surfacing patterns and insights to validate and pursue. The primary audience is usually the data visualizer themselves or an audience that is close enough to the project to understand the limitations and nuances of the rough draft stage. As you move into Build, your audience often broadens and your purpose starts to shift from exploration toward communication. This often requires a change of form, and usually involves a lot more annotation and clarification of the chart insights. A Build audience is often not as close to the data, and may not even be all that familiar with data vis, so you’re also looking to refine the visualization to work for them. 

Considerations for chart design in Expand vs Build:

Expand:

  • Support the data visualizer in thinking through the problem and understanding the dataset
  • Surface patterns and identify interesting insights for further analysis
  • Compare multiple narratives and viewpoints, often in the same chart
  • Work out the main variables of interest and experiment with visual form to see what works (often as quickly as possible)

Build:

  • Optimize your visualizations so that they communicate your findings to a broader audience
  • Highlight the important points to improve legibility and reduce noise
  • Make relevant comparisons so that the viewer will understand your conclusions
  • Focus in on the specific points or comparisons that you want to make

Things to consider when transitioning from Expand into Build:

  • Remove unnecessary data points. Your audience is further from the dataset than you are, and they’re unlikely to understand the nuances of your dataset. Where you see useful context, your audience will usually just see noise. Unless the full dataset is the point of your chart, it’s generally best to remove it. 
  • Use more common charts. Sometimes you really do need a fancy visualization to explain a dataset, but most people struggle to read even basic charts. Get too adventurous, and there is a risk that people won’t even understand the point of your chart after all of your hard work. A complicated visualization increases your “wow” factor, but it may also reduce your audience size and your impact. Make your editorial choices accordingly.
  • Add guideposts to help your viewer understand. Good use of color, line weight, and other visual variables adds hierarchy and context to your chart. Supplementary annotations, captions and text explanations call out important points, clarify your purpose, and allow your viewer to confirm that they understood your point. At this stage, it’s hard to overstate the clarity and emphasis that you can achieve with sophisticated visual design. Take the time to do it well.  
  • Clear visual hierarchy is critical. You can fit many layers of data into the same chart if you establish a clear visual hierarchy. Ideally, this should follow the importance and relevance of information types as someone reads your chart. 

Remove alternate interpretations or ambiguous visuals. A good exploratory visualization often allows you to make multiple comparisons at once. It’s intended for someone who knows how to move between different kinds of comparison, and who can block out the noise to focus on a specific task at hand. This is helpful when you are trying to work out what’s in the dataset, but it’s less helpful when you are trying to make a point. Review your visualization for alternate interpretations, distractions and visual noise, and revise it to clarify and sharpen your point.

Our chart choices reflected a combination of editorial, aesthetic, and practical considerations. We needed charts that supported the comparisons we wanted to make, were interesting to look at for a group interested in visualization, and were simple enough to reduce the time-consuming manual work required to create charts for publication. We also wanted to support specific questions that our audience might have. Fortunately, this project allowed us the opportunity to experiment with several different forms for the data during the early exploration, so we had a good sense of our options going into Build. 

We ended up splitting the report into two sections: the first looked at comparisons across careers, for those trying to decide what kind of vis they wanted to do. The second focused on comparing different variables within a career, for those who cared more about career advancement or understanding themselves in relation to their chosen field. In the first section, we kept the complexity of comparative vis across career types, and supplemented it with text labels and other visual details for readability. In the career section, we chose simpler visualizations (mostly bar charts) that broke out individual variables within that career for more focused exploration. Most of the charts discussed below are from the first section, and include comparison between careers and against the general population.

In my earliest data explorations, I almost always just use default charts in whatever tool I’m using (Excel, in this case). A grouped bar chart helped me compare the number of people in each career area experiencing different frustrations in the State of the Industry Survey data.

Bar chart titled "Top Frustrations" showing the count of frustrations faced by Analysts, Designers, Developers, and Engineers in various categories like accessing data, data volume, and lack of time.

This was useful in terms of understanding raw counts, and the data itself calls out the difference in sample size within the survey data: there are many fewer engineers than analysts! Within that, I can see the peak values for each group, and I can compare values across groups pretty easily. When I tried to identify more interesting patterns, though, I found that I was losing the sense of each career as a whole in this chart. 

A different grouped bar chart helped me to see the careers as individual entities, but the lookup task to read the legend makes it really hard to get much insight about the specific frustrations or to make comparisons between career groups.

Bar chart titled "Top Frustrations" with separate bars for each role (Analyst, Designer, Developer, Engineer) showing counts for categories like accessing data, data volume, lack of collaboration, and technical limitations of tools.

A radar plot overlapped the dataset values on the same axes, and made it easier to compare values for my four series along each one.

Radar chart titled "Top Frustrations" comparing the counts of frustrations faced by different roles (Analyst, Designer, Developer, Engineer) in categories such as accessing data, low data literacy, and lack of design expertise.

The default chart settings were pretty terrible for readability; it was hard to follow the axis lines, some of the data was occluded by other series, and in general there wasn’t much hierarchy within the chart. A few simple changes improved legibility significantly.

Radar chart titled "Common Frustrations" highlighting the percentage of respondents from different roles (Analyst, Designer, Developer, Engineer) experiencing various frustrations like accessing data, information overload, and lack of mentorship.

Reducing the opacity of the area fill prevented occlusion. A strong solid line kept the outlines strong and helped to make the chart colors more readable. Adding axis lines clarified where people should look to make their value comparisons, and also directed the eye out of the center of the chart toward the axis labels for easier lookup. Text hierarchy and annotations made the content of the chart clearer. I also changed the radial axis metric from raw counts to percentages for better comparison between groups. This choice erased the difference in sample size from the chart, but it allowed for better comparison between reported experience for the different groups.

This may be the first time that I’ve ever chosen to use a radar plot for a published graphic. I don’t usually find them to be particularly readable, especially for categorical comparisons. To me, the data areas tend to read as connected shapes rather than data points, but in this case that connection was precisely what I was struggling to see in the grouped bar chart.  Unfortunately, once my brain summarizes the data series into a shape, that shape becomes the strongest identifier in the chart, and I start comparing the details of one shape vs another. It’s very hard to counteract that tendency, even when I know that the shape I see is nothing more than an artifact caused by the order of my axes. 

A quick sorting experiment shows how strong those effects are. In the first two charts, the axis order is relatively arbitrary. In the third, I sorted based on value for the Analyst group. This creates a clear pattern for the Analysts, but it also means that everyone else is implicitly compared against that standard.

Three radar charts titled "Top Frustrations" for different roles (Analyst, Designer, Developer, Engineer), each illustrating the count of respondents facing various frustrations such as accessing data, lack of design expertise, and lack of time.

I find these strong axis-ordering effects quite distracting, and I often feel that they obscure real signals in the data. In general, I tend not to use radar charts for this kind of data for that reason. The grouped bar chart is more flexible and gives a less biasing view in most situations, but that advantage becomes a weakness when there are so many comparison points in the chart. As it becomes more and more difficult to integrate across the bar groups, a radar plot can removes the cognitive burden of grouping the series—as long as you can ignore the ordering artifacts. 

With the visual cleanup and proper context, this chart worked well to support early discussions with the community about their frustrations working in this field. For those conversations, we wanted people to react to and think about alternate interpretations for the data, so the more ambiguous visualization worked well. In a guided discussion, we could emphasize the weaknesses and support clear interpretation, and the idiosyncrasies of the radar chart made an interesting discussion point. For a report meant to be read independently, I felt that these ambiguities and artifacts reduced the value of the chart. 

The heatmap is another popular alternative. Instead of a spatial axis, this chart uses value to compare the different counts, simplifying down to a 2-dimensional display of career vs frustration and using color to encode the metric values. 

This works well for identifying outliers—it’s pretty effortless to identify the darkest and the lightest square. It is pretty terrible for comparing relative values, especially with so many colors in the chart. I could have binned this down to a high, medium, and low color to emphasize patterns, but that suppresses a lot of the variation that we were looking to expose. We did use the heatmap in our initial report as a way of identifying extremes within the survey population. In that case, the lack of resolution in the data points was fitting because we had not yet determined our full statistical relevance of our data and could not provide appropriate context for evaluating small differences. For the final report, we wanted to capture more of the richness in the dataset when comparing frustrations between groups.

We did keep a heatmap for the Barriers to Entry data in the final report, because that was a situation where large variations and differences in pattern between career area were more important than the smaller details. We included an Overall column in the graphic to allow comparison of each career area against the field as a whole, and we supplemented the chart with value annotations so that people could read rather than guess at the chart values. Adding in the n values as part of the series definition also helped to clarify the context in situations where a small sample size might be skewing results.

Heatmap displaying challenges faced by different roles (Analysts, Designers, Developers, Engineers) including time/balance, support, skills/training, and finding a job/pay.

For the frustrations analysis, we essentially “unrolled” the radar plot back onto cartesian axes rather than using a radial plot. The radar area became a line, which still has continuity to facilitate comparison between values but creates less confusion with an identity encoding (“the spiky shape” vs “the highest point”).

Line chart comparing frustrations and issues facing data visualization among different roles (Analyst, Designer, Developer, Engineer) for categories like lack of time, accessing data, and low data visualization literacy.

We introduced the overall population distribution as a secondary layer in this chart also, and used that value as a sorting index for the frustration categories. This way, there was a clear rule for sorting the categories that provided information about the dataset, but it didn’t require one career to become the default comparison for all of the others.

We encoded this context layer as a more subtle background bar chart, and superimposed the career series lines on top of it. Their bold weight and more interesting color keeps the focus on the career areas and pushes the contextual information to the back. The reference values for the overall population are still readily available if someone wants to make that comparison, but they don’t interfere with reading the primary data. The different visual forms (bar vs line) helped to separate the different levels of aggregation (population vs individual career), and the bars avoided clutter by not adding another line to an already-crowded chart. Again, the legend reports counts for each series so that the reader can identify population size differences that are suppressed when we define the value axis as a percentage.

For other comparisons, we mixed the 2D grid of the histogram with a line chart. Here, we wanted to emphasize how different careers used specific techniques. We chose to use size instead of color for our metric encoding, partly because it emphasized the sort order and relative differences within each column. These differences will always be difficult to read in an area encoding, so we supplied data values and let the visual form act as a reference to reinforce the organization of the chart.

Line chart comparing the preferred types of data visualizations among different roles (Analysts, Designers, Developers, Engineers) such as bar charts, line charts, and scatterplots.

The resulting “bump” charts have a column for each career, sorted in terms of the relative percentage for that career. It’s easy to see that bar charts are the top chart for every career except designers and engineers, who tend to use line charts more instead. The number annotations identify that the gap is indistinguishable (real numerically, but within rounding error for the annotations and definitely within margin of error for the analysis) for the Designer group, and more pronounced for the Engineers.

If the exact size of the usage was more important than the ranking (or to de-emphasize differences smaller than our margin of error), we could have collapsed multiple bubbles into one for values that were so close, so that they looked more similar in the chart. That would have been a bit more accurate but a lot more work. In the end, we decided that this version would be ok since the ranking values were clear. 

There is an axis-ordering artifact in this chart, also: if you follow a particular line in the chart, you can see that infographics are pretty far down the list for most career areas, but they jump up to third place for designers and down to 12th for engineers. The fact that Designers are next to Developers makes the jump less drastic than it would be if they were right next to engineers. Here, I felt that this artifact was less distracting than in the radar plots. 

The bump charts are probably the least familiar visualization of the set, but we thought that they did a good job of highlighting where specific techniques, methods, or audiences were different for one career vs the others. I would have loved to play with these in an interactive context, because I think there are lots of things you could do to improve readability and reduce the flaws of the chart if the user could temporarily select a method (or set of methods) of interest. This visualization could also easily accept a highlight series, if there were one particular series or method that we wanted to spotlight. Our report was intended for static or print distribution, so we ended up keeping the style flat to allow the viewer to make their own comparisons based on their interests.

For the individual career reports, we used simpler visualizations that illustrated differences within a single variable for that specific career. These visualizations were much simpler to make and to read, and supported a more focused, narrated experience for the individual careers.

Collection of bar charts showing salary distribution, size of organization, and sector of respondents (Engineers), alongside a bar chart of commonly used tools such as Python, React, and Tableau.

We also needed to be thoughtful about balancing off-the-shelf and custom graphics to keep our deliverable scope reasonable and our project on time. The report ended up using a mix of simple and custom charts, based on where we saw added value for the narrative in going a little bit outside of the box. The heatmaps and radar plots were fast and fairly simple to make with off-the-shelf software. They only needed minor visual cleanup to improve readability and style. The superimposed bar and line charts were created separately and then manually overlaid; those took an extra step in the processing, but it was quick to do. Everything about the bump chart was manual, and those were some of the most time-expensive charts that we put into the report.

In terms of chart selection, I probably wouldn’t push the edges this far for a general audience. Even within a specialist group, there was some confusion about what the bump charts meant and how to read them. In our case, we wanted to use a variety of charts to emphasize specific points within the dataset, and we felt that we could afford to challenge this audience a bit. We also felt that some chart variety and novelty was important to keep things more dynamic, so that the report would be fun to read for a group who already knows a lot about vis. 

All of these choices were part of a dynamic decision-making process throughout the Build phase, informed by both the chart purpose, audience, and task, and by the practical considerations of what was possible and easy to build with the technologies we had. We considered many alternate forms and more advanced comparisons that didn’t make the cut, and we identified many charts that would make interesting standalone projects (perhaps in a more interactive medium) for another day. Hopefully, the end product provided a little interest with sufficient clarity for the group we were intending to serve.

Previous articles in this series:
Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop
Step 5 in the Data Exploration Journey: Collaborate to Accelerate
Step 6 in the Data Exploration Journey: Cut to Realistic Scope
Step 7 in the Data Exploration Journey: Spin Off Projects
Step 8 in the Data Exploration Journey: Build

Related links:
Early Sketches for Career Portraits in Data Visualization, by Jenn Schilling
DVS Careers in Data Visualization, YouTube Playlist for interview series by Amanda Makulec and Elijah Meeks
Career Portraits project (DVS Member space login required)

Erica Gunn is a data visualization designer at one of the largest clinical trial data companies in the world. She creates information ecosystems that help clients to understand their data better and to access it in more intuitive and useful ways. She received her MFA in information design from Northeastern University in 2017. In a previous life, Erica was a research scientist and college chemistry professor. You can connect with her on Twitter @EricaGunn.