A

Avoiding Data Pitfalls — an Interview With Ben Jones

We caught up with Ben Jones, the founder and CEO of Data Literacy, and spoke to him about his recent book, Avoiding Data Pitfalls. Ben’s book is described as a reputation-saving handbook for those who work with data, designed to help you avoid the all-too-common blunders that occur in data analysis, visualization, and presentation. The book explains that unless you truly understand how to work with data, there are many ways in which you can ultimately mislead and cause costly mistakes.

Our questions were from Neil Richards, Data Visualization Society’s Knowledge Director.

Neil Richards: I enjoyed the book and found it really easy to read, because I felt like I was having a conversation with you about data while I was reading it. I know this book has been a labour of love for you since you’ve been working for Tableau. Tell me why you wanted to write this book in particular?

Ben Jones: Really for two major reasons: Firstly, most of the conversation, at least in the data visualisation community right then, centered around chart type choice. There was a list of “no-no” chart types that people would get ridiculed for using and it occurred to me that you could use a completely compliant bar chart but mislead people for totally different reasons unrelated to the encodings of the chart itself. That really hit me because the conversation wasn’t really about that, it was just about the last part in the process, which chart to use, when there had been so much done already up to that point: gathering the data, transforming it, creating calculations, blending, joining, all of these steps that could be completely full of errors that would lead you to create a nice innocent-looking bar chart that completely misled someone.

Secondly, realising that the tools were becoming so powerful (like Tableau and many others) that I sometimes didn’t feel I had the knowledge and skills to put that to good use without “driving a Ferrari into a brick wall,” so I wanted to give people a warning about things that were pretty easy to do wrong. Also I was teaching at the University of Washington at the time — some of those students were making the exact same mistakes I used to make, there was a lot of commonality there.

NR: You make it quite clear that in seven chapters, the ones we think of as the chart and design elements are really just chapters six and seven, so you make the important point that five-sevenths of the stuff you really ought to be thinking about has already happened by this stage.

BJ: I felt that for those last two points, a lot of really good books have already been written about them. I refer to Alberto Cairo’s new book How Charts Lie in that chapter itself — really recognising I didn’t want to write another book about that, but for those unfamiliar with the concepts I really couldn’t leave them out, so that’s why they are included but not given the lion’s share of attention.

NR: I would have expected your first book to feature the wording and branding of Data Literacy more, and it’s interesting that it doesn’t. How does Avoiding Data Pitfalls fit in with your Data Literacy company?

BJ: Good question. At the very end, my publisher suggested we might as well put my Data Literacy connection on the cover, but I think it’s because this project started well before I woke up to the idea that another way of affecting and improving those same issues was through the positive message of Data Literacy. At the same time, if you think about it, consider grammar: Take a sentence and pull out the comma from “Let’s Eat, Grandpa” — if you take the comma out, that completely changes the meaning. Even in literature, writing and grammar, there are very small mistakes you can make that have disastrous consequences on the overall meaning. I think the same is true in data. To me how they fit together is that data literacy is effectively speaking the language of data. A part of effectively speaking the language of data is knowing what simple common errors to watch out for and avoid similar grammatical errors.

NR: So Data Literacy is the positive and Avoiding Data Pitfalls is the negative — the carrot and the stick?

BJ: Yes — what to do versus what not to do.

NR: When I picked the book up, before I’d even opened it, my first question would have been “how many of these errors have you made yourself?” but you make it quite clear right from the start that you have made them all. I know I certainly have too, so I guess it was important to make it clear that everyone does?

BJ: We have this culture where mistakes are dragged out into the open — I don’t know if we have a climate right now where people can raise their hand and say “mea culpa” but with data now it’s so common and it’s dangerous to have that culture because now we can’t help others learn, we have to cover those mistakes up. I think that would get in the way of helping others become data literate if they felt like every time they made a mistake they weren’t allowed to say they’d done something wrong (even if they were an author, an expert, or whoever) — but yes, if I say I’ve made many of these mistakes before, maybe it’ll help people in some small way to think yes it’s ok to say that. That’s part of the goal to recognise that: data’s not perfect, but really neither are we.

NR: I love the way it’s set into seven chapters (I particularly liked Epistemic Errors, as director of knowledge I learned the new word Epistemology, I might use a new title!)

They are all alliterative, obviously that was deliberate, and we’re quite well known in the Tableau community for our alliterative projects such as Makeover Monday and Sports Viz Sunday. Were there some that missed the cut?

BJ: Oh my gosh — when you have as many as are in there, you get shackled — more than one time I wondered why I went down this path and thought now I have to figure out a decent alliteration.

NR: Alliteration in subsections as well as chapters, you were committed!

BJ: There were more than one that were really stretching — I thought people would really love or hate it!

NR: It really helps from a mnemonic point of view — you talk about having a checklist at the end of a book, perhaps you’re more likely to remember “Graphical Gaffes” and “Analytical Aberrations.”

BJ: Hopefully it sticks a little bit, exactly.

NR: Were any messages strongest, that you feel people had paid least attention to — we mentioned how much attention have typically been paid to the topics of the last two chapters, but were any that you felt were more important, or more overlooked, than others?

BJ: I would have loved to have had more time to develop Chapter 2: Technical Trespass — if I ever did a revision that might have some more contact, for sure. For ones that people think of the least, we’re all painfully aware of Statistical Slip Ups, that’s well known: p values, hypothesis testing, people generally have that experience in “Stats 101” in college where they go “this is complicated” — I don’t think that was brand new and many other books have been written that expand on this. Graphical Gaffes was around chart choice but I tried to change the focus of that so it was less around chart choice and more around this idea of how do we consider these alternatives we have in this arsenal of visuals and encoding to apply and how do we think about making use of all of those or the right ones.

This idea of “optimising vs. satisficing” is one that I pulled out from operations research and tried to bring to the dataviz world in a blog post a few years ago, and it’s really in response to people saying, “You need to make the best chart choice” versus saying “I’ve got one that’s good enough,” and realising that most fields recognise there is room for both of those approaches. That’s not a choice that you necessarily have to take exclusively. We live in a world where there are some people figuring out the very best ways to do things and all nuances of what works well and pros and cons, but we also need a world where practitioners are able and free to go with what works and also have some ability to not be shackled by those constraints. That is an idea I was really excited about when I came across it because I think it came across really well.

Also the last one, the eighth one, that didn’t even make the lists, that’s something we’re only really beginning to wake up to, that there’s so much bias in the world we live in. Many voices are not heard and that can affect our data. That even affected my data in trying to write the book — trying to look for quotes and not encountering that many by women.

NR: My eyes are really opened in a good way by that — you and I and many others are passionate about this, but I read that on the very same day that I went to a user group that was 90% male and I was thinking “Why am I wearing my Data Plus Women shirt here — the women aren’t here to read that message?” It feels like at least in smaller pockets there’s still so much more for us to do.

BJ: I think it’s a long journey to chip away at those walls and barriers that are keeping a more diverse group of people from participating in the data dialogue. I at least try to say in the spirit of the book that “Hey, wow, I got it wrong myself right here in the writing of the book” so that was something I was shocked to find out. You have to look at this and go “how did this happen?” We are not as far along as we think we are.

NR: We get influenced by the past, don’t we? If books from two years ago only have texts and influencing quotes by male people then it’s hard to find those things, we need to break out.

I feel like it was a really personal journey for you — you put in quite a lot of personal and professional anecdotes through the book. One example which put you even higher in my own personal estimation was that you once had to get toilet paper for Hans Rosling. I would have that on a T-shirt if that was me!

BJ: I took a screenshot of the text! The stressful thing was I forgot my charger at home, so I had the message saying “Ben do you have the toilet paper for Hans?” and the battery on the screenshot is down to 2%! I literally ran to the drug store through the streets of Seattle with a roll of toilet paper and back to the conference so it was a high moment for me for sure, a great honour to be a part of that presentation.

NR: I won’t put more spoilers on but that’s definitely something I’d recommend reading in the book right there!

It was one of those books that was such an easy read that I was able to go right through in the knowledge that there would be bits I wanted to go back and re-focus on, and in this case I wanted to go back to the bit about intuition again. That’s the bit I wanted to understand — intuition versus analysis. I think you were breaking new ground, offering a different opinion that said we are not replacing intuition with analysis, we need to use them side by side. Can you explain that a bit more?

BJ: I feel strongly about this — I feel we do a disservice to data and analytics when we have this idea that somehow the data is going to tell us what to do. That’s overselling the value of data, it doesn’t just make the decisions for you, there’s a human in the loop. That’s a really important message especially for people who maybe are dataphobic — we’re not saying that data needs to replace you, your experience, your ideas of what should happen, your notion, your gut — those are not bad things. Those are things we need to incorporate into a multifaceted view along with data. Sometimes the data is going to tell you your intuition is dead wrong, that definitely happens. But sometimes the opposite happens, you just have the intuition that the data is wrong, something is not right, you can’t really describe what it is, you look into it further and, lo and behold, you do find out what it is describing is pretty much the opposite of reality because of one of the issues/possible errors arising.

Not just that, that intuition is going to tell you what’s wrong with data, that’s one part of it. The other part is telling you what to do next, where to go. What do we do now we know these facts? Maybe we need to look somewhere else, think about what this means in a different way? Even implying there is meaning there, assessing and assigning value to routes and goals, these are all human endeavours. They are intuitive in the sense they are not always fully thought out logically with a prescribed process, it might just be the sense we have.

NR: “Sucker Punches?”

BJ: Yes — like Jonas Salk, discoverer of polio vaccine: “Intuition told me where to look next” — data doesn’t always tell you that, it might raise a question you didn’t have, but maybe the next question is outside of the data. Maybe I need to talk to someone — these are all valuable parts of the process that allow us to steer a ship in the right direction (a company, family, city). Recognising that data is not replacing human intuition is something I think it’s really important to do.

 

CategoriesData Literacy