How Ethical Data Visualization Tells the Human Story

Data is more than just a series of abstract numbers; it holds the power to convey meaningful stories about individuals’ lives and their unique experiences. These narratives can wield significant influence when represented through effective visualisation techniques. This is why ethics and data visualization are inseparable partners in the quest for truth—failing to acknowledge the human behind the data risks telling an incomplete and misleading tale, devoid of crucial information.

The question is how you approach data with ethical consideration. It’s important to adopt a more holistic perspective encompassing data collection, analysis, and visualisation, by focusing more on the bigger picture of the story you are telling versus just looking at the numbers. This can be a challenge because we data visualizers tend to prioritise efficiency and easy understanding. Due to this, we’re biased when we view data as the sole arbiter of truth, believing that numbers don’t lie. However, we forget that data collection and analysis can be influenced by human biases.

The ethical considerations in data visualisation include thinking about the data process and collection. The critical questions to ask ourselves are:

What is the data collection process?
Are there nulls in the data?
What are you excluding or including in your data prep process?
How are you choosing to show all of this information?

Missing Data:

Often data tells the story from one perspective and there’s missing information we don’t take into consideration. This information is just as essential to visualise and show as the available information you have access to. This is something that’s all around us impacting people’s real life, such as educational performance reporting, which is consistently only based on standardised test scores and university acceptance. It’s very evident that certain schools consistently have significantly consistent higher test scores compared to others, creating an apparent achievement gap between schools. But by focusing on standardised test scores, we are missing a lot of other meaningful information to have a complete picture.

Ethical concerns arise when the missing data or contextual factors that could explain the disparities are not adequately addressed in the data visualization. Some of the reasons for missing data or disparities in educational performance could include:

Socioeconomic disparities: Schools in wealthier neighbourhoods might have more resources, access to tutors, and a more supportive learning environment.
Special education inclusion: Some schools might have a higher proportion of students with special education needs, which can impact overall test scores and graduation rates.
Teacher quality and experience: Differences in the quality and experience of teachers can influence students’ academic performances.
Student mobility: Schools with higher rates of student mobility might face challenges in maintaining consistent academic progress.

The ethical implications of not addressing missing data or contextual factors are significant. By not providing a comprehensive understanding of the educational landscape, the data visualization might inadvertently perpetuate stigmatization or unfair judgments against schools with lower performance. If we only look at numbers and ignore missing contextual factors we end up telling a completely different story.

“If we only look at numbers and ignore missing contextual factors we end up telling a completely different story.”

Ethical data visualization practices would involve acknowledging and addressing the complex factors that influence educational performance in data visualization. Providing contextual information about the schools, such as the student demographics, teacher qualifications, and available resources, additionally, the visualization could use comparative benchmarks or indicators to assess schools’ performances, considering the unique challenges they face. This can help users better interpret the data in a more truthful way that focuses on the people vs. just the numbers.

Ultimately, ethical data visualisation in education requires recognising the complexities and systemic factors that contribute to disparities in performance and presenting data in a way that promotes understanding and fairness. By doing so, educational stakeholders can make informed decisions and work collaboratively to address the root causes of performance gaps and improve educational outcomes for all students.

There are many more examples in the real world of how missing information on key topics can increase prejudice and stigma. This is why it is not enough to just look at the numbers that you’re presented with, you need to also become an expert in the topic you are visualising to be able to take into account contextual factors that can have a significant impact on what you’re reporting on.

Design:

Design helps visualise information in a way that we can understand and empathise with; it tells a story, tells us why we should care and helps us make better-informed decisions. This is where ethics comes into play. People’s decision-making skills are often influenced by the data they’re viewing; it shapes public opinion on important topics. A popular example of how biased data visualisation can mislead people is the map of county votes in the US presidential election.

A 2020 presidential election map showing a swath of red across the country, with blue only in certain pockets (namely, the counties that are predominantly urban. — Map of 2020 US presidential election by county.

This is an inaccurate representation that reinforces stereotypes about states and doesn’t take into account the population because “Land doesn’t vote. People do.” Karim Douïeb, a data scientist in Brussels, he figured out a more accurate way of representing this data in a way that humanises the data and puts the focus on people.

An animated gif of the 2020 election map, showing the county version (almost all red) transition into a bubble map. The effect shows the populations of voters, where many of the red counties become small red dots and the blue counties become larger blue dots. — 2020 US presidential elections, showing the population of voters more accurately.

It is important to think critically about how you are choosing to visualise your data and to be as unbiased as you can be. However, this is easier said than done, being completely unbiased during the data visualization process is challenging since we all naturally carry implicit biases that influence our perceptions and decisions all the time. The best way of reducing our own biases is to be as objective and informed as possible. You can do this with several different techniques:

Define clear objectives: Establish the purpose of the visualization and the questions you’re aiming to answer. This will allow you to stay focused on presenting the data objectively without favouring specific outcomes.
Use multiple data sources: If you can rely on multiple data sources to cross-validate findings and avoid over-reliance on a single dataset, that will reduce bias and paint a more accurate picture of the topic you are visualising.
Avoid cherry-picking data: Present the data in its entirety, and avoid selectively choosing data points that support a particular narrative.
Collaborate with diverse teams: Working with colleagues from diverse backgrounds can help identify and challenge biases, fostering a more balanced interpretation of the data.
Be transparent about methodology: Clearly explain the data collection, analysis, and visualization methodologies used, ensuring transparency for the audience. The majority of the time this information can be presented in an info button, end users are always open and curious to learn more about the data collection process.
Question assumptions: Continuously challenge assumptions and preconceived notions during the visualization process to avoid reinforcing bias inadvertently, remember to think critically.
Seek feedback and peer review: Invite feedback and review from peers or experts in the field to identify and address potential biases.
Address potential contextual variables: Consider and account for contextual variables that can influence the results, this will ensure the visualization is as accurate and unbiased as possible and will paint a better picture of what’s happening.

Complete neutrality may be unattainable, but the goal is to be aware of our biases and actively reduce them to create data visualizations that are as unbiased and objective as possible. Regular self-reflection, openness to feedback, and a commitment to ethical practices are essential in promoting unbiased data visualization.

Ethical thinking in data visualisation isn’t just about the intentions behind your work, it’s also about the consequences of your work. Design is based on making justifiable ethical and functioning choices according to the goal of your visualisation to help prevent reaffirming biases, prejudice and errors. This is why designing visualisation doesn’t consist of applying ‘rules,’ but is instead based on making justifiable ethical and functioning choices, according to the goal of your viz. A lot of the time when we talk about design for visualisation it’s followed by a list of rules. (There are many rules you need to follow—whole books have been written about them.) But once you understand best practices, you should use them as guidelines and realize that the ethical component of design should take precedence. There are times when these rules are very valid and times when they’re not the golden rule of thumb.

Disaggregate your data vs. big numbers

Data visualisation is also an outlet of creativity, to be able to think outside of the box and use art to humanise data and amplify people’s voices. A fundamental way of creating more ethical data visualisation is humanising data visualisation. This not only makes your design more relatable and engages people, but it also emphasises that people are more than just another number or stat. A lot of the time it can be confusing how to do this and the most impactful way to humanise data visualisation designs is through storytelling: stories help people emotionally connect to what you are portraying. You can think of your visualisation in the form of chapters: what is your beginning, middle and end? What actions do you want people to take?

Sometimes having a big number isn’t enough to convey the story you want to tell. This example below is a visualisation I created looking at people who have died in the process of migration to a safer country. I visualised each person or group of people that have been found as a circle. The aim was to disaggregate the data to highlight both the number of people that have gone missing and their characteristics about them. In certain scenarios, we have become desensitised towards big numbers and sometimes it’s hard for us to envision the tragedy behind those numbers so often I try to find disaggregated ways to visualise data and numbers that are about people.

An image of a big number with text. It says: 47,821 Total number of dead and missing since 2014, a 25% increase in death compared to 2020.

versus

A bubble chart of migrants, where the each bubble is colored by place of origin and the size of the bubble is the number of migrants.

In this example, I’m visualising journalists that have been killed in action whilst covering a story. Each dot in a country represents someone who died and it’s coloured by how recent this was (blue is more historical and red is more recent) The line between the dots shows whether the deaths were consecutive (a broken line means there was a gap in time). With each dot, the user can hover to find who each person was. I wanted the focus to be on the person instead of just one large number or a bunch of numbers aggregated into a bar chart. This was the first visualisation that I built where I considered the people behind the numbers I was looking at. I ended up spending a lot of time thinking about how I could do these journalists justice in telling their stories in such a way that would get people to care about the data.

A graphic with nine spirals, each for a different country: Afghanistan, Algeria, Angola, Argentina, Armenia, Azerbaijan, Bahrain, Bangladesh and Belarus. The spirals are made of dots, some connected, some not. Some are red hues, others are blue hues.

There are different genres of data stories. Some stories are focused on explaining how things work, and other stories are more focused on exploring the data and giving a more interactive element to the reader so they can discover what they can / want to learn from the data set.

Colour

In a visualisation I built with Fred Najjar the focus was on reconstructing an event and using data and visuals to explain and educate about the tragic explosion in Lebanon on August 4, 2020. A lot of the time these types of stories almost always follow a linear chronological narrative. But the distinguishing factor is that these stories focus on explaining a single event and are often a good method for uncovering info and or correcting misconceptions.

An infographic showing "The Day Lebanon Changed." The graphic has concentric circles of different reds, showing the epicenter of destruction and the outer rings of destruction. The graphic also has key stats such as deaths, injuries, missing, homeless and damages. \ — Infographic about August 4, 2020: “The Day Lebanon Changed.”

Colours are a big important way to humanise the data that you are visualising and make a huge impact. Before picking colours think about how you want to use colour in your visualisation. You can use colour to highlight important information and draw attention towards a specific point.

Here, I’m using colour to highlight a key comparison between the size of the explosion that happened in Beruit in 2020 and previous ammonium nitrate explosions. Beruit was the focus topic of the visualisation so throughout the visualisation we highlighted data pertaining to Beruit in red and use blue for everything else.

A graphic showing the extent of different explosions using the spiral treatment. The Beirut disaster is sixth in the graphic, out of a series of 11 other explosions. — The Port Beirut disaster in context: “How Big Were The Explosions?”

Another example of the use of colour is when I wanted to take a look at diversity in astronauts. Each dot is an astronaut, but I used purple to draw attention towards astronauts that were notable pioneers who were the first to do something in their field. Other astronauts who did not have this designation are white dots.

A graphic that looks like a dandelion that has gone to seed: the middle is a large dot and from that point are lines connected to other dots. The dots represent astronauts and there are callouts for specific people, spelling out their accomplishments.

More importantly, we have a symbiotic relationship between colour and emotions. The emotional triggers prompted by colour have cultural differences. Take the colour red, which has such different contrasting meanings across the globe: in China red is considered a lucky colour that symbolises good fortune and happiness, but in India red is associated with purity, sensuality and fertility and is often a colour used in bridal wear. However, in South Africa red is associated with mourning and in Western countries it is associated with danger, passion, and anger. This is an example of how one colour can have such contrasting meanings depending on the country and culture. This is critical to keep in mind when creating visualisations. Consider both the audience’s cultural background and the topic of your data visualisation.

It’s important to remember that the visualization process involves several complex steps, and ethical procedures must be practised throughout so that the final result is as truthful as possible. During this process, you need to confront yourself about any biases you have that may skew how you perceive the data and how you intend to visualise the data. So you need to ask yourself why you intend to build this visualisation and what you hope to gain from this.

We need to take an uncomfortable honest look at ourselves and ask ourselves whether we have unconscious biases about the data or the topic that may influence our work. This is especially true if you’re working with data that is political, racial, or about any class of people that faces discrimination.