Showing Human Stories Behind Data Points

Part 1: The power of visual aggregation

One of the many advantages of data visualisation is that it can help us understand big numbers. A simple bar chart can show many thousands of people. A line chart can track a whole country’s population over time. However, these displays that aggregate a large number of people or things in single visual elements can make it harder to emotionally connect with the people or objects being counted. Some designers instead choose to “disaggregate” data—a design style that focuses on individual points rather than grouping points together—to create a more emotional connection with the entities being counted. “An Incalculable Loss” by The New York Times shows individual silhouettes, letting the reader scroll down the page to give a sense of what the first 100,000 people to die from COVID-19 would look like arranged in this way.

A screenshot of a small section of this very tall graphic, showing about 600 tiny silhouettes. Most of them are grey, but about 10 are black, and have short texts next to them, such as “Proud single mother of three, Louvenia Henderson, 44, Tonawanda, N.Y.”. It also has a counter of deaths at the bottom right, which reads “March 27. Deaths: 1,758” — “An Incalculable Loss” by The New York Times.

The 2014 commemorative artwork “Blood Swept Lands and Seas of Red” likewise used 888,246 ceramic poppies to represent the British fatalities during the First World War, spilling out into the moat of the Tower of London.

These pieces of course took considerable time and effort to display a single data point, but they help the viewer to understand the magnitude of a very large number in a way that is sometimes challenging with aggregated visual elements.

Part 2: A challenge at hand

Several months ago, I, Will Stahl-Timmins, a data graphics designer, was working on data visuals to help explain a harrowing dataset uncovered in a joint investigation by The BMJ (British Medical Journal) and The Guardian. The investigation had uncovered over 35,000 recorded cases of rape, sexual assault, harassment, stalking, and abusive remarks between 2017 and 2022 in health facilities run by the UK’s NHS (National Health Service). I was testing a couple of different ways of visualising the data using aggregated displays.

I first tried a matrix structure with area-based circles to represent the data (rather than bars, to enable both horizontal and vertical comparisons), but it could take the viewer a little while to work out what the 14 different circles represent:

A chart which shows a grid of 12 circles, sized to represent different quantities. The actul numbers are also shown next to the circles 6 show sexual violence, 6 show sexual misconduct. These are subdivided by perpetrators (visitor, patient, staff) and victims (patient or staff). The biggest circles are patient perpetrators and staff victims (9143 for sexual misconduct, 4407 for sexual violence. — An early draft of the sexual abuse graphic, using a bubble chart.

Then I experimented with a Sankey diagram, which seemed more useful, as it enables groupings—such as how many patients were perpetrators, or how many staff were victims. However, this format does make it more challenging to compare some other values that are arranged vertically from each other, such as the patient perpetrators for sexual violence versus sexual misconduct.

The same data as the previous image, this time presented with two Sankey diagrams. The top diagram is titled sexual violence, and the bottom sexual misconduct. The total numbers in each group are presented this time. Perpetrators are shown to the left of both charts, and the biggest group is patients (5724 for sexual violence and 12518 for sexual misconduct). The victims are shown to the right, and the biggest group are staff (4627 for sexual violence and 9729 for sexual misconduct). — Another early draft of the sexual abuse graphic – this time using an alluvial/Sankey diagram.

While both of these methods might be considered valid ways of presenting the data and comparing the numbers in the dataset, I thought that they weren’t really capturing the scale of the data, or the enormous impact of the events depicted on so many lives.

It was around this time that I, and a few hundred others, attended the Data Visualization Society’s Outlier conference in Porto (and online). It was there that Nadieh Bremer explained her views on visual aggregation of data in her keynote talk.

Here are Nadieh’s key points on this topic…

Part 3: Visual aggregations and visual diversity

When I, Nadieh Bremer, hear or read about the “average” or “the average person” I always feel that I’m only getting a narrow sense of the full story. I want to know how the full distribution looks. And so, when I’m visualizing data myself, I like to give the readers that extra context. If possible, I always prefer to visualize the data in its lowest level of detail and provide any aggregations visually.

Aggregating the data visually to show more context

Take the following example about satellites. (As an astronomer, I of course enjoy a good outer space example whenever I can get my hands on one!) A few years ago, Scientific American asked me to create a visual about the active satellites for an article about possible “space wars” for their November 2020 edition. I had to focus on which countries own the satellites and where the satellites are located by orbital region.

A treemap-Marimekko-like visual with rectangles could’ve worked. But I received the data per satellite. And with the nearly 3,000 active satellites in space at the time, I had enough room across the allotted two-page spread to turn each into a circle. However, they are all still clearly grouped, visually aggregated, to show who owns each group of satellites and where above Earth they can be located.

The full two-page spread visualization for the "Space Wars" article that appeared in Scientific American revealing all the active satellites and several of their main properties, such as ownership, size, age, and more. — “Space Wars” by Nadieh Bremer – Legends and text by Jen Christiansen

Showing the satellites themselves as individual circles let me specifically call out a few famous ones, like the Hubble Space Telescope and the satellites mentioned in the story, such as the two Russian Cosmos satellites mentioned in the “Space Wars” article. But also, this treatment allowed me to “mark” each satellite with more metadata, such as using the circle size for weight, opacity for age, and color, icons and other indicators to show various other satellite characteristics. This made it possible for a reader to mentally compare specific satellites to all the others, adding that extra layer of context.

When the data is about humans and the experiences that we have, it gets even more vital to try and give each “data point” its own voice, its own mark, to humanize it to the reader.

Showing the data in a detailed level also creates visual diversity to intrigue the eyes, to let them have something to wander over and explore, to find possible side stories even.

A close-up of the printed Space Wars visual, revealing a subtle gradient within each circle. — A close-up of the printed “Space Wars” visual, revealing a subtle gradient within each circle.

The satellites in the “Space Wars” visual had a lot of metadata to create enough diversity. However, sometimes the data just doesn’t have enough variety to create a good level of individuality, of diversity, and you’re at risk of merely creating a “blob of sameness.” Where you’re displaying each datapoint separately, but where everything is the same “grey circle” (although sometimes this can be quite intentional). In those cases, I like to add some randomness to the visual aesthetic. For data visualizations I try to make it subtle, where people generally won’t assume that it means something (where it can be almost invisible unless truly looking for it, but the effect definitely has a visual impact). For example, you might notice from the “Space Wars” image above is that all of the circles have a subtle (radial) gradient that is rotated differently.

A visual about the division of harmful pesticides versus non-harmful pesticides being sold for five of the major crops being sold worldwide - created for Unearthed. — Nadieh Bremer’s graphics for a report on pesticides. The grey voronoi segments are subtly different shades of grey, to provide visual interest.

Creating visual diversity through randomness

I was asked to create a data art collection for the Giga project of UNICEF. Their goal is to connect all of the schools in the world to the internet. When I got into the picture, Giga had data showing the internet-connection status for roughly 300,000 schools (mainly in developing countries).

However, besides knowing the country and if the school had internet or not, there weren’t truly any other relevant variables for me to work with. To create 1,000 unique and interesting artworks, though, I needed more variety than the dataset could provide, and since this was a data art collection, I fully embraced “randomness” to create the visual diversity.

I came up with a concept of using the schools to create tiny kingdoms. A little decorated square representing each school. These squares would stack to form cities. And together, they would look like kingdoms. However, there is a divide, a digital divide, where the schools already connected to the internet form a bustling kingdom at the top, with vibrant colors and intricate decorations. The schools not connected instead form a hidden upside-down city, using more muted colors and only simple decorations.

I used randomness to divide all 300,000 schools across the 1,000 artworks, letting the number of schools per artwork range between 100 (which looked more like villages) and 450 (which were more like cities). I used randomness again to randomly place each school, each square, on an invisible background grid (Tetris-style, in a way).

Another big part where randomness comes into play is the symbol inside each school. Each of the 51 unique symbols has different levels of complexity, such as either one, two, or three concentric circles. What symbol a school gets is mainly determined randomly. But I do use the internet speed of a school to decide what level of symbol complexity is drawn. The higher the speed, the more complex the symbol. The 10% of schools with the highest speeds are even shown as flowers or rainbows.

Two examples of the Patchwork Kingdom collection, showing a small "village" on the left and large "metropolis" on the right. — “Patchwork Kingdoms” by Nadieh Bremer

And it doesn’t stop there: I also used randomness to determine which of the 24 different color palettes to apply, what possible easter eggs to hide and to create a very subtle background pattern of contours, linking all 1,000 kingdoms together in a giant map of 40 by 25 pieces.

I tried to use the schools data in as many ways as I could think of to determine visual aspects, but by using all this randomness, I could take this dataset with fairly little information about the schools and turn it into 1,000 unique data artworks.

This was a more extreme example of using randomness together with data, but applying some subtle randomness in even the most seemingly straightforward data visualization, can make your visual more intriguing to look at, more beautiful, and in some cases, more human.

Here’s how Will was able to apply some of these methods to the health facilities harassment dataset…

Part 4: Inspiration strikes!

After seeing Nadieh explain these ideas at the conference, my mind turned immediately to the graphic I was working on. I’d realised two things: 1) I needed to devote more attention to the victims in the graphic – as they were the most important people in the story 2) I was going to have to try and represent them. All 35,000 of them. Within the next hour, this very rough sketch had emerged on my iPad:

A hand sketch showing three half silhouettes at the top representing perpetrators, labelled patients, staff, and visitors. They seem to cast shadows over little dots below representing victims. These are grouped as patients, staff, and visitors. Some of the dots representing victims are black, and some are red. — An redesign sketch produced at the Outlier conference by Will Stahl-Timmins.

I was already trying to work out how to give the points some kind of individual properties – colour for sexual violence / misconduct perhaps? And random variation for size? And how to calculate the right angles for the shadows in D3.js? There would be trigonometry afoot when I got home… The conference was pretty busy but I did get a brief chance to show the sketch to Nadieh. She was full of encouragement, and thought that this new approach humanised the dataset. And she liked the victims’ shadows!

When I got back to my desk I ignored my stacked up emails and made a first very quick version of the new graphic, with simple blocks to represent the people:

An infographic, titled “sexual offences [sic] in the NHS”. Like the sketch before, it shows three perpetrator types as silhouettes which cast shadows over red blocks representing cohorts of victims. These red blocks are made to look a little 3D, and have a few faint circles and shoulders suggesting people. Text has been added to introduce the graphic and show the exact numbers. — A draft graphic, made by Will Stahl-Timmins using Adobe Illustrator.

The “blocks” of people were easy to make but didn’t have the kind of individuality that was needed. The shadows, when applied to the blocks, didn’t have the same effect. I was just about to open up D3 and start calculating vanishing points and angles of shadows, but then decided to send a quick sketch to a colleague, Ben McNeilage. Ben works a lot on The BMJ’s “Best Practice” point of care tools, making 3-D graphics. We wondered if it might actually be easier to model the people in 3-D, and have the engine draw shadows, rather than try to create the kind of “pseudo 3-D” that would have been generated in D3. I sent over the following sketch and asked if it might be possible to create a LOT of cylinders, with the kind of individuality in shape and size that Nadieh had mentioned in her talk:

A simple sketch with 3D columns, 3 red and 4 black, grouped loosely together with some in front of others. — A sketch of cylinders to represent people, by Will Stahl-Timmins using Procreate software on iPad Pro.

A couple of hours later, Ben sent back this:

A 3D render of about 300 cylinders, some red and some black, arranged in a tall vertical column. — Sample 3-D cylinders, made by Ben McNeilage using Blender 3D and Chat GPT.

Already this seemed like an improvement: the columns, despite being simple shapes, have a kind of personality to them because of their shape and size. I sent over the number of columns of each colour that were needed in the six perpetrator/victim categories. We realised that it was going to be tricky to use a 1:1 scale with 9,143 people in the biggest group, and over 35,000 people overall. I sent over the numbers at 1:1, 1:10, and 1:100 scales for Ben to experiment with. The next day, he sent back a design which was incorporated to form this first draft:

The draft infographic, this time with the 3D rendered cylinders added. A 1:10 scale is used, so there are about 2.1 thousand cylinders in the largest group, to represent 20.9 thousand people. — A draft graphic using the first real data 3-D model of the cylinders.

It seemed that the 1:10 scale gave the best “individuality” while preserving a sense of the large scale of the reported abuse. We mocked up a version with a 1:1 scale, but the columns seemed to lose too much visual fidelity, becoming “noise” rather than “individuals”:

The draft infographic, with a rather rough mock-up of what the graphic would have looked like with the full 20.9 thousand cylinders in the biggest group. They are so small it’s hard to see what they are, they look more like black and red static than actual objects. — A quick mock-up to see what the graphic would look like with 1:1 scale.

We could, at this point, have decided to present this piece as a long scrolling piece at 1:1 scale, like The New York Times’ “An Incalculable Loss,” but we were working with several constraints in mind. Firstly, the aim of this graphic was to attract attention to the investigation, and graphics in this kind of static format can easily be shared and viewed on social media. Secondly, we were also designing for a printed version of the journal, so we often like to keep graphics to a size that fits on one or two pages. This way they can appear both on a page and online. Thirdly, there were time constraints, with only a few days to go until the deadline.

Eventually we decided to use the 1:10 scale and the single page static format. Ben tweaked the angle of the camera and the light source so that the shadows in the 3-D element matched more closely the shadows from the silhouettes in the design (the vanishing point for those was far off the top of the canvas!) Eventually, after a few rounds of editing and proofreading, the following version appeared in the journal:

The final graphic, now titled “Sexual safety incidents in the NHS”. It uses the 1:10 scale for cylinders, with about 2.1 thousand in the biggest group. — The final graphic, as published in *The BMJ*.

Part 5: When to disaggregate

We know that the “average person” doesn’t exist in reality, so instead of averaging your data beforehand and visualizing the result as “one bar” (or other type of visual element) it can be a really effective technique to present every person (or other entity) in the dataset visually, and aggregate them through their visual properties like positioning them in groups, or making them different colours or sizes. This helps to remind our audience of the people or other entities behind a big number, in a way that isn’t so easy with aggregated numbers like averages or even large blocks that represent many people together. When we are using this kind of technique, we can try and make our disaggregated people or other entities more individual, by introducing subtle visual variation through randomness, as Nadieh has explained.

However, Will’s experience with creating the sexual safety graphic, and the judgements made, won’t necessarily apply directly to a different data set on a different topic. In this case, we chose to work with a 1:10 scale to show objects (representing people) that the viewer can relate to, but maintain a sense of scale. Plenty of design decisions are project specific, and can lead to any number of paths to do justice to the humanization of the visual. For instance, if the numbers had been 10x bigger, should we have used a 1:100 scale? And would that water down the impact we are trying to achieve? There are no hard rules, no correct answers. Constraints on time, budget, publication formats, and many other things will also influence the best way forward. There is always a healthy dose of personal judgment that is needed to find the right balance and there are plenty of individual decisions to make about scales and formats, which will depend on the project being worked on. So let’s keep experimenting!

Will Stahl-Timmins

Will Stahl-Timmins is Data Graphics Designer at The BMJ (British Medical Journal) and a freelance designer, based in Oxfordshire, UK. His background is in graphic design. He holds a PhD in the use of information graphics in health technology assessment from Exeter Medical School. Outside work, he spends a lot of time playing with his daughter (currently age 3), cooking, gardening, and playing board games with anyone who he can get to the table.

Nadieh Bremer

Nadieh Bremer is a data visualization artist who once graduated as an Astronomer and started working as a data scientist before finding her true passion in the visualization of data. As 2017’s “Best Individual” in the Information is Beautiful Awards and co-writer of “Data Sketches,” she focuses on uniquely crafted visuals for each specific dataset, often using large and complex datasets while employing vibrant color palettes. She’s made visualizations and art for companies such as Google News Lab, Sony Music, UNICEF, the New York Times, and UNESCO.