Getting to Know… Lawrence Gray, PhD

The dataviz community is broad and encompasses practitioners from a range of backgrounds, professions, and interests. In an effort to get to know the community, Nightingale periodically features interviews with dataviz practitioners to showcase what working in dataviz looks like for them.

Today, we meet Lawrence Gray, Ph.D., Head of Data Science at KPMG Spark, an adjunct faculty teaching analytics including visualization, and an entrepreneur who has described himself as a “chill guy with interests from Reggae to surfing.” He also contributes to open source communities through NumFOCUS, which supports innovative scientific computing, and Python Software Foundation and its annual PyCon conference. His path to machine learning with data visualization shows why doing your best work can mean trying something new this year. You can find him on LinkedIn.

Want to nominate someone (including yourself!) for an interview? Get in touch!

Kathryn for Nightingale: We first met when I was one of your students, and I’m really glad to share your perspective with the Data Visualization Society community (DVS). Would you like our readers to know you as Dr. Gray or Larry?

LG: Larry.

Nightingale: Would you put what you do in your own words, and describe how dataviz fits in?

LG: Dataviz is essential to what I do. To boil it down to what I do every day, visual analysis is required for me to better understand the analytical aspect of my work. I know that if I were to do statistical descriptions of data, without visual representation, my interpretation could be completely off.

That idea comes from Anscombe’s quartet, where the statistician Francis Anscombe looked at the descriptive statistics of four different sets of data, and the descriptive statistics were all the same: the same mean, same standard deviation. It wasn’t until you visually looked at the datasets that you saw they were different. I’ve seen an example called Datasaurus, where one dataset gives you a picture of a dinosaur but still has the same descriptive statistics as another.

Data visualization is essential for the work that I do as a data scientist, and the visual analysis is what allows me to move quickly and efficiently through the models that I build.

Image of a 2x2 grid of graphs showing common pictographs made up of the dots on each graph, from top left: a blue Tyrannosaurus Rex dinosaur head, to two nested green circles in the top right, to a red star in the lower left, and nine magenta dots spread out evenly in the lower right-hand graph. — *Always Larry’s student, the author Kathryn Hurchla used the Yellowbrick library to plot four Datasaurus Dozen datasets created by Justin Matejka and George Fitzmaurice.*

I use programming libraries such as Yellowbrick, which is a Python machine learning library built on visual analysis. You create a visual chart such as ROC curves, which is a representation of how your machine learning models are performing. The visual aspect of it is extremely important because, instead of looking at numbers, I look at images and am able to quickly determine the performance and quality of the models that I’m building. That’s solely a visual experience, and being able to create those visualizations becomes extremely important.

Nightingale: That’s one of the things you teach in your classes, that data visualization should happen throughout the analytical process.

LG: Without question, I could not do the work that I do without dataviz. It actually affects the efficiency and the quality of the work that I do. If I know a machine learning model is not doing well, I can identify that the model is performing horribly, right? I can make changes to see if it improves. If it doesn’t, I can move on to something else.

I’m a core contributor for Yellowbrick. I guess more or less, I am a dataviz person. I actually maintain a programming language package that’s dataviz focused!

Example of side-by-side learning curve visualizations titled as Gaussian Naive Bayes (NB) and Support Vector Classifier (SVC) respectively, with Training Score plotted in blue and Cross Validation Score in green on each graph. — A learning curve shows the relationship of an estimator’s training score in the machine learning process versus its cross validated test score, i.e. how much the estimator benefits from more data, and if it is more sensitive to error due to variance vs. error due to bias. Credit Lawrence Gray, PhD

Nightingale: You began your career as an academic scientist, with research in physiology and computational biology. Do you follow a similar scientific approach in your work today? Are there parallels in data science with the more established fields of biology and chemistry?

LG: Yeah, there definitely are. It all starts with the problem statement, and that’s how I approach all my work—clearly stating the problem at hand and coming up with a hypothesis to test. That’s the scientific method. It is foundational in how I tackle problems in data science.

Nightingale: You’ve brought that up sometimes with how you work with your own team at KPMG, always getting back to what the problem is that you’re trying to address.

LG: Yeah. I wrote a playbook that describes how we do data science at Spark. It begins with the problem statement, two to three pages dedicated to defining the problem. Throughout the book, it always comes back to your problem statement: does this really address the problem statement that you began with, and are you going to have to refine that question?

Nightingale: Are there data visualization standards or practices in your playbook?

LG: Yeah, dataviz comes into play in a couple places. In data exploration, you’re doing descriptive statistics and exploratory analysis. You’re creating histograms, you’re creating box plots, all these different descriptive visualizations to get a better feel for your data, right?

Then we’re also doing dataviz at the other end, where we’re developing machine learning models and we want to look at how the models are performing. There are different visualizations that we construct that measure performance. For example, in our models, we’re changing all these different aspects; we record every time we change something, and we look at the output. One of the outputs is a dataviz that shows performance or quality metrics. I can go back later and say, “Hey, you said that model was the best. But according to this visualization, you can see that this model is actually performing better.”

Another aspect of how we use dataviz is in how we convey outside of our team that, from a business perspective, we built something that outperforms our current model. We have to convince business leaders that we need to have this in production and we’ll benefit from it. So how do you convince them? These data visualizations are great in that regard. You can match them up and say, “Hey, look at this performance in these areas where we were not performing well before. We’re now performing a lot better.”

We’re not creating fancy plots, but visualizations are the way that I like to see information. I look at numbers and my eyes glaze over. Let me see a picture.

Nightingale: When you’re presenting a case like that to another business leader, do they tend to have a strong technical background?

LG: No. Most of the people I work with don’t have technical backgrounds. Part of my job is to help our organization move to be more data-driven. When I first joined, I ran a data literacy program; part of that program was dataviz. Like, how do you interpret graphs?

The business leaders are not technical leaders, so the first thing you’re trying to accomplish is to make sure that what you’re trying to explain is very, very clear. Most of the time we don’t get a meeting where we can describe everything. We’re generally sending an email and describing what’s going on. This person has to be able to pick this up and come to the conclusion that you expect. Accomplishing that is a talent—I’ve been doing this for close to 20 years now.

Nightingale: That’s reassuring it takes time! Could you imagine achieving the goals of your teaching without visualizing data?

LG: Without question, I couldn’t. I couldn’t do my job—my jobs—without dataviz. There’s a huge data visualization module in what I teach for Georgetown University, and even the advanced Python course that I teach at Maryland Institute College of Art (MICA) has dataviz elements embedded. Dataviz is essential. I would feel like I was doing a disservice to my students if I didn’t teach them how to create visualizations related, in my case, to data analysis. They would be short-changed.

Nightingale: Your career path has included different industries, startups, and a Ph.D. How did you find your way to analytics?

LG: The path I took went from my discovery of loving technology, to loving problem-solving, to loving problem-solving and technology together, and to working quickly. I realized those were the things that needed to happen for me to be my best.

Early on I traveled the world, got the startup bug, and realized I liked to code more than I liked being out in the field. Working in biomedical research and computational biology allowed me to be extremely creative, but the pace at which discoveries could be made was slow. When I got hooked on machine learning, it had everything I liked and I could get answers really quickly. I need to have that immediate feedback.

Nightingale: What’s it like being part of the growing tech scene in Salt Lake City, Utah? You’ve also traveled to every continent except Antarctica—if you could work anywhere, where would you be?

LG: If I could work remotely from anywhere, it would probably be in Tasmania, in Australia. The weather’s awesome there. The people are awesome.

I’ve integrated myself a lot into the tech scene in Salt Lake City as a co-organizer of the PyData Meetup and bringing in speakers such as a fellow Yellowbrick board member and author of top selling Python books. I’ve met as many people in the tech community as I could, to try to get them involved. Outside of New York, Salt Lake City’s one of the top financial technology centers in the world.

Nightingale: DVS is really about both the expansion and refinement of data visualization knowledge. How does machine learning factor into this, as the two disciplines mature?

LG: What’s happening right now is we have C-level executives needing to better understand machine learning models because of incidences regarding implicit biases, whether about justice reform or models that are biased against women, for example resulting in women not being able to get certain loans. That’s where dataviz can play a part, in being able to communicate algorithmic bias to stakeholders.

There’s this inherent link between data scientists’ work and the production of data visualizations. If we start building hundreds of models, our ability to interpret those models, to make them as efficient as possible, is linked to our production of data visualizations.

Nightingale: You’ve written blog posts about teaching and described holding generous office hours with students as actually selfish—for you! What’s in it for you?

LG: I have a mentor, and early on in my education she spent countless hours just listening to me. We got excited about questions and went deeper into explanations of things. It dawned on me that she truly believed that students possessed the ability to understand things much more deeply than many gave them credit for and that it was her responsibility as a mentor to help develop that type of thinking and curiosity. In doing that, you naturally are going to learn a lot yourself.

How do I grow? I try to replicate that. The great thing about teaching is that you get to understand what you don’t know quickly.

Nightingale: Thank you, Larry. Do you have any other reassuring or cautionary words to share?

LG: One of the things I want to say is for those that do not come from a hard-core technical background but like data visualization and want to explore the more technical side. I believe, from my experience teaching at MICA, that these dataviz practitioners are more than capable of learning these skills. It goes back to a blog post that I wrote on LinkedIn about what I see in humanities and art students. These students possess the ability to be creative in how they approach technical problems. Those who have a dataviz background but might otherwise steer clear of Python or other technologies are more than capable of learning them. I want to encourage them to take up a new challenge this year in programming or technology. It’s possible.

Kathryn Hurchla

Data Design Dimension | Website

Kathryn is a data developer and designer at home shaping human experiences as an Analytics Lead with F Λ N T Λ S Y, a design agency like no other. A note from Data Design Dimension is a sign she has a crush on playing with your data or telling your story. She has a master’s degree in data analytics and visualization and learned the art of getting things done with technology in world-renowned organizations.

She writes about visual data science in the real world and contributes to uplifting data communities as a Plotly Dash Ambassador and an author liaison with our own Nightingale Editorial Committee. Find her in conversations ranging from affordable homeownership in cities to forestry or just being in nature.