Part 1: The power of visual aggregation
One of the many advantages of data visualisation is that it can help us understand big numbers. A simple bar chart can show many thousands of people. A line chart can track a whole country’s population over time. However, these displays that aggregate a large number of people or things in single visual elements can make it harder to emotionally connect with the people or objects being counted. Some designers instead choose to “disaggregate” data—a design style that focuses on individual points rather than grouping points together—to create a more emotional connection with the entities being counted. “An Incalculable Loss” by The New York Times shows individual silhouettes, letting the reader scroll down the page to give a sense of what the first 100,000 people to die from COVID-19 would look like arranged in this way.
The 2014 commemorative artwork “Blood Swept Lands and Seas of Red” likewise used 888,246 ceramic poppies to represent the British fatalities during the First World War, spilling out into the moat of the Tower of London.
These pieces of course took considerable time and effort to display a single data point, but they help the viewer to understand the magnitude of a very large number in a way that is sometimes challenging with aggregated visual elements.
Part 2: A challenge at hand
Several months ago, I, Will Stahl-Timmins, a data graphics designer, was working on data visuals to help explain a harrowing dataset uncovered in a joint investigation by The BMJ (British Medical Journal) and The Guardian. The investigation had uncovered over 35,000 recorded cases of rape, sexual assault, harassment, stalking, and abusive remarks between 2017 and 2022 in health facilities run by the UK’s NHS (National Health Service). I was testing a couple of different ways of visualising the data using aggregated displays.
I first tried a matrix structure with area-based circles to represent the data (rather than bars, to enable both horizontal and vertical comparisons), but it could take the viewer a little while to work out what the 14 different circles represent:
Then I experimented with a Sankey diagram, which seemed more useful, as it enables groupings—such as how many patients were perpetrators, or how many staff were victims. However, this format does make it more challenging to compare some other values that are arranged vertically from each other, such as the patient perpetrators for sexual violence versus sexual misconduct.
While both of these methods might be considered valid ways of presenting the data and comparing the numbers in the dataset, I thought that they weren’t really capturing the scale of the data, or the enormous impact of the events depicted on so many lives.
It was around this time that I, and a few hundred others, attended the Data Visualization Society’s Outlier conference in Porto (and online). It was there that Nadieh Bremer explained her views on visual aggregation of data in her keynote talk.
Here are Nadieh’s key points on this topic…
Part 3: Visual aggregations and visual diversity
When I, Nadieh Bremer, hear or read about the “average” or “the average person” I always feel that I’m only getting a narrow sense of the full story. I want to know how the full distribution looks. And so, when I’m visualizing data myself, I like to give the readers that extra context. If possible, I always prefer to visualize the data in its lowest level of detail and provide any aggregations visually.
Aggregating the data visually to show more context
Take the following example about satellites. (As an astronomer, I of course enjoy a good outer space example whenever I can get my hands on one!) A few years ago, Scientific American asked me to create a visual about the active satellites for an article about possible “space wars” for their November 2020 edition. I had to focus on which countries own the satellites and where the satellites are located by orbital region.
A treemap-Marimekko-like visual with rectangles could’ve worked. But I received the data per satellite. And with the nearly 3,000 active satellites in space at the time, I had enough room across the allotted two-page spread to turn each into a circle. However, they are all still clearly grouped, visually aggregated, to show who owns each group of satellites and where above Earth they can be located.
Showing the satellites themselves as individual circles let me specifically call out a few famous ones, like the Hubble Space Telescope and the satellites mentioned in the story, such as the two Russian Cosmos satellites mentioned in the “Space Wars” article. But also, this treatment allowed me to “mark” each satellite with more metadata, such as using the circle size for weight, opacity for age, and color, icons and other indicators to show various other satellite characteristics. This made it possible for a reader to mentally compare specific satellites to all the others, adding that extra layer of context.
When the data is about humans and the experiences that we have, it gets even more vital to try and give each “data point” its own voice, its own mark, to humanize it to the reader.
Showing the data in a detailed level also creates visual diversity to intrigue the eyes, to let them have something to wander over and explore, to find possible side stories even.
The satellites in the “Space Wars” visual had a lot of metadata to create enough diversity. However, sometimes the data just doesn’t have enough variety to create a good level of individuality, of diversity, and you’re at risk of merely creating a “blob of sameness.” Where you’re displaying each datapoint separately, but where everything is the same “grey circle” (although sometimes this can be quite intentional). In those cases, I like to add some randomness to the visual aesthetic. For data visualizations I try to make it subtle, where people generally won’t assume that it means something (where it can be almost invisible unless truly looking for it, but the effect definitely has a visual impact). For example, you might notice from the “Space Wars” image above is that all of the circles have a subtle (radial) gradient that is rotated differently.
Creating visual diversity through randomness
I was asked to create a data art collection for the Giga project of UNICEF. Their goal is to connect all of the schools in the world to the internet. When I got into the picture, Giga had data showing the internet-connection status for roughly 300,000 schools (mainly in developing countries).
However, besides knowing the country and if the school had internet or not, there weren’t truly any other relevant variables for me to work with. To create 1,000 unique and interesting artworks, though, I needed more variety than the dataset could provide, and since this was a data art collection, I fully embraced “randomness” to create the visual diversity.
I came up with a concept of using the schools to create tiny kingdoms. A little decorated square representing each school. These squares would stack to form cities. And together, they would look like kingdoms. However, there is a divide, a digital divide, where the schools already connected to the internet form a bustling kingdom at the top, with vibrant colors and intricate decorations. The schools not connected instead form a hidden upside-down city, using more muted colors and only simple decorations.
I used randomness to divide all 300,000 schools across the 1,000 artworks, letting the number of schools per artwork range between 100 (which looked more like villages) and 450 (which were more like cities). I used randomness again to randomly place each school, each square, on an invisible background grid (Tetris-style, in a way).
Another big part where randomness comes into play is the symbol inside each school. Each of the 51 unique symbols has different levels of complexity, such as either one, two, or three concentric circles. What symbol a school gets is mainly determined randomly. But I do use the internet speed of a school to decide what level of symbol complexity is drawn. The higher the speed, the more complex the symbol. The 10% of schools with the highest speeds are even shown as flowers or rainbows.
And it doesn’t stop there: I also used randomness to determine which of the 24 different color palettes to apply, what possible easter eggs to hide and to create a very subtle background pattern of contours, linking all 1,000 kingdoms together in a giant map of 40 by 25 pieces.
I tried to use the schools data in as many ways as I could think of to determine visual aspects, but by using all this randomness, I could take this dataset with fairly little information about the schools and turn it into 1,000 unique data artworks.
This was a more extreme example of using randomness together with data, but applying some subtle randomness in even the most seemingly straightforward data visualization, can make your visual more intriguing to look at, more beautiful, and in some cases, more human.
Here’s how Will was able to apply some of these methods to the health facilities harassment dataset…
Part 4: Inspiration strikes!
After seeing Nadieh explain these ideas at the conference, my mind turned immediately to the graphic I was working on. I’d realised two things: 1) I needed to devote more attention to the victims in the graphic – as they were the most important people in the story 2) I was going to have to try and represent them. All 35,000 of them. Within the next hour, this very rough sketch had emerged on my iPad:
I was already trying to work out how to give the points some kind of individual properties – colour for sexual violence / misconduct perhaps? And random variation for size? And how to calculate the right angles for the shadows in D3.js? There would be trigonometry afoot when I got home… The conference was pretty busy but I did get a brief chance to show the sketch to Nadieh. She was full of encouragement, and thought that this new approach humanised the dataset. And she liked the victims’ shadows!
When I got back to my desk I ignored my stacked up emails and made a first very quick version of the new graphic, with simple blocks to represent the people:
The “blocks” of people were easy to make but didn’t have the kind of individuality that was needed. The shadows, when applied to the blocks, didn’t have the same effect. I was just about to open up D3 and start calculating vanishing points and angles of shadows, but then decided to send a quick sketch to a colleague, Ben McNeilage. Ben works a lot on The BMJ’s “Best Practice” point of care tools, making 3-D graphics. We wondered if it might actually be easier to model the people in 3-D, and have the engine draw shadows, rather than try to create the kind of “pseudo 3-D” that would have been generated in D3. I sent over the following sketch and asked if it might be possible to create a LOT of cylinders, with the kind of individuality in shape and size that Nadieh had mentioned in her talk:
A couple of hours later, Ben sent back this:
Already this seemed like an improvement: the columns, despite being simple shapes, have a kind of personality to them because of their shape and size. I sent over the number of columns of each colour that were needed in the six perpetrator/victim categories. We realised that it was going to be tricky to use a 1:1 scale with 9,143 people in the biggest group, and over 35,000 people overall. I sent over the numbers at 1:1, 1:10, and 1:100 scales for Ben to experiment with. The next day, he sent back a design which was incorporated to form this first draft:
It seemed that the 1:10 scale gave the best “individuality” while preserving a sense of the large scale of the reported abuse. We mocked up a version with a 1:1 scale, but the columns seemed to lose too much visual fidelity, becoming “noise” rather than “individuals”:
We could, at this point, have decided to present this piece as a long scrolling piece at 1:1 scale, like The New York Times’ “An Incalculable Loss,” but we were working with several constraints in mind. Firstly, the aim of this graphic was to attract attention to the investigation, and graphics in this kind of static format can easily be shared and viewed on social media. Secondly, we were also designing for a printed version of the journal, so we often like to keep graphics to a size that fits on one or two pages. This way they can appear both on a page and online. Thirdly, there were time constraints, with only a few days to go until the deadline.
Eventually we decided to use the 1:10 scale and the single page static format. Ben tweaked the angle of the camera and the light source so that the shadows in the 3-D element matched more closely the shadows from the silhouettes in the design (the vanishing point for those was far off the top of the canvas!) Eventually, after a few rounds of editing and proofreading, the following version appeared in the journal:
Part 5: When to disaggregate
We know that the “average person” doesn’t exist in reality, so instead of averaging your data beforehand and visualizing the result as “one bar” (or other type of visual element) it can be a really effective technique to present every person (or other entity) in the dataset visually, and aggregate them through their visual properties like positioning them in groups, or making them different colours or sizes. This helps to remind our audience of the people or other entities behind a big number, in a way that isn’t so easy with aggregated numbers like averages or even large blocks that represent many people together. When we are using this kind of technique, we can try and make our disaggregated people or other entities more individual, by introducing subtle visual variation through randomness, as Nadieh has explained.
However, Will’s experience with creating the sexual safety graphic, and the judgements made, won’t necessarily apply directly to a different data set on a different topic. In this case, we chose to work with a 1:10 scale to show objects (representing people) that the viewer can relate to, but maintain a sense of scale. Plenty of design decisions are project specific, and can lead to any number of paths to do justice to the humanization of the visual. For instance, if the numbers had been 10x bigger, should we have used a 1:100 scale? And would that water down the impact we are trying to achieve? There are no hard rules, no correct answers. Constraints on time, budget, publication formats, and many other things will also influence the best way forward. There is always a healthy dose of personal judgment that is needed to find the right balance and there are plenty of individual decisions to make about scales and formats, which will depend on the project being worked on. So let’s keep experimenting!