How a DVS Mentorship Changed My Approach to Data Journalism

As an editor of Reed College’s student newspaper, The Reed College Quest, my days look a lot like those of student journalists across the country. I interview professors, walk and talk with concerned students between classes, and send lots of emails to college administrators — many of which go unanswered. I’ve even done my fair share of sprinting across campus to cover unfolding protests, or calling lawyers from the nonprofit Student Press Law Center to figure out if my latest scoop is even legal to publish.

But my days also include a lot of data science work that would be unfamiliar to many student journalists. In the spring of 2023, I was the first in the newsroom to discover a second tab in an Excel file accidentally released by the college — named with the file extension “Exempt Ranges – Hidden” — which became the backbone of our explanatory coverage of staff protests against Reed’s proposed changes to employee compensation. More recently, I led a series of investigative stories which brought to light a database vulnerability that had exposed the campus ID numbers of thousands of students, faculty, staff, and alumni — one that had long been known by IT but gone unfixed for months.

I like to say that most of my best stories have been found in the developer console, not in a reporter’s notebook, and these days my first instinct when chasing a scoop is to open a new RStudio environment and start collecting data.

Yet that can be a difficult tightrope to walk. I think of myself as a writer, but my education is in computer science, and — to ask most of my peers in either discipline — the two could not be more different. For much of my life, it was difficult for me to envision a path that would allow me to explore my passion for writing and my skill with data without sacrificing one for the other.

Nowhere has that tension been more apparent than in my work at the Quest. As a student journalist, I’ve been trained to always write in a way that’s accessible to the reader. But as a student of computer and data science, I have experience in data analysis that most of my readers simply don’t.

For me, data visualization can be a bridge between the worlds of data science and journalism: a way to weave hard-won insight and reporting into otherwise esoteric facts and figures.

Yet I’ve known for a long time that the balance necessary for such data-driven reporting is difficult to learn, and my work in student publications can only go so far in preparing me for the rigors of an investigative data journalism career. So, in the summer of 2023, I turned to the Data Visualization Society to improve my education.

***

The DVS summer mentorship program, which matches students with experienced professionals, was my dream come true. My mentor Julia Wolfe, Americas Graphics Editor at Reuters, was an expert in exactly the kind of data-driven reporting I hope to pursue, and I will always be grateful to her for taking the time to advise a student-journalist like me.

Throughout our ten weeks together, Julia and I collaborated on a project I’d been envisioning for months. I’ve always considered myself a writer and lover of languages first and foremost — I plan to minor in Spanish and Latin American literature, and I’ve often approached my computer science coursework in Python, C, and other programming languages just as I would a foreign language class (an approach aided by the fact that, at Reed, Introductory CS carries a foreign language credit).

I’m fascinated by the data of language: the structures and patterns we use to express complex, abstract ideas in formal writing. I wanted to find a way to map rhetoric, to turn words on a page into numbers and then back into art, to see the why and how of speech laid bare and in vivid color.

In retrospect, I could have chosen any kind of speech to study. But I chose political speech, mostly because of the upcoming presidential election. Using a Reuters dataset provided by Julia, I began studying presidential campaign speeches in light of the 13 key issues that American voters ranked as their highest priorities in recent weeks. My early versions — built using Flourish — were clear but blunt, conveying little more than the number of times certain words, which I thematically grouped — were mentioned in each candidate’s speech.

My approach was based on simple word groupings: if a candidate mentioned the words “jobs,” “companies,” or “inflation,” it counted toward a mention of the economy; “border,” “immigrants,” or “aliens,” toward immigration, and so on. I wanted to make that strategy more clear, to be more transparent in my design and give the reader more of an opportunity to see the judgment calls that went into deciding which words fell under each issue. My next version incorporated those ideas by visually grouping the words together.

But this prototype felt messy to me. It’s a data scientist’s chart, not a journalist’s: it makes the maximum amount of data visible, but doesn’t have the rhetorical structure necessary to make that data clear or informative to a general audience.

After several more attempts, I thought I had found the perfect prototype. Rather than try to pack more information into a single layout, I decided to parse the information into three axes: position, color, and size. That way I could encode three variables — party alignment (color), candidate ranking (position), and voter ranking (size) — without sacrificing a minimalist layout or expanding into multiple charts. Finally, I was satisfied with my work. I zipped a folder of prototypes, sent it off to Julia in preparation for our next meeting, and closed my laptop for the night.

Respondents to a recent Reuters poll identified 13 key issues that concern them, but their rankings were not always in line with those of their candidates. Above, issues are ranked by their emphasis in candidates’ kickoff speeches for the 2024 campaign, but sized by the priority given to them by Democratic and Republican voters. — Respondents to a recent Reuters poll identified 13 key issues that concern them, but their rankings were not always in line with those of their candidates. Issues are ranked by their emphasis in candidates’ kickoff speeches for the 2024 campaign, but sized by the priority given to them by Democratic and Republican voters.

Then, disaster struck.

As an aspiring data journalist, I consider myself a dedicated follower of The Washington Post’s Department of Data. When I idly pulled up the Post’s homepage that Monday morning, I saw something that instantly set my heart racing: a new column titled, “The Words GOP Presidential Hopefuls Use To Stand Out In a Crowded Field.”

My heart in my throat, I clicked.

And there they were. Page after page of carefully crafted bubble charts, sized by word frequency — more beautiful, more carefully crafted than my own, but in intent and structure almost identical to some of my earlier prototypes. My idea, it seemed, was not that original after all, and The Washington Post had beaten me to it.

A minimalist packed circles chart visualizing the words used by presidential candidate Donald Trump in his November 22 campaign kickoff speech. Words are represented as gray circles sized by their frequency, with some key terms like “great” and “Biden” highlighted in yellow. — *The Washington Post*’s Department of Data published visualizations of political rhetoric similar to ones I had considered.

Had I given up then — quit in a moment of dejection — my project would have been over. And I’ll admit, I considered it. I didn’t want to be seen as simply copying The Post’s work, and it would be hard to explain that I really had coincidentally developed a very similar piece at around the same time.

Julia, luckily, talked me out of abandoning it. She reassured me that it isn’t that unusual for multiple journalists to pursue similar pitches around the same time, especially when it comes to significant topics like the presidential race.

To my surprise, however, she offered significant constructive criticism of the final prototype I had become so fond of. Size, she said, was frowned upon as an axis for important information, since relative sizes can be very difficult for the human eye to compare. That essentially gutted the layout of my final chart.

At first I dug my heels in. For lack of any better reasoning, I just liked my final draft. I found it visually pleasing, and the data scientist in me liked the idea of using size, color, and position to encode information in unexpected ways.

But then I realized something. In designing my charts to maximize the amount of information presented across every axis, I was thinking in the terms of computer science, where encoding information in an efficient and elegant way is key. But there were other kinds of efficiency and elegance to consider. Efficiency of communication, for one, and the elegance of explanations that bring clarity and understanding to the reader. To tell an effective data-driven story, I needed to put aside my personal pride and write to the reader — to prioritize efficiency of communication over efficiency of design.

“But then I realized something. In designing my charts to maximize the amount of information presented across every axis, I was thinking in the terms of computer science, where encoding information in an efficient and elegant way is key. But there were other kinds of efficiency and elegance to consider.”

I went back to the drawing board. If I wanted to design a more approachable and more accessible chart, I’d have to do it from the ground up. I asked myself what my lede (or topline finding) was, what information I wanted to communicate most clearly, and found — to my surprise — that it wasn’t the word rankings themselves but how the issues ranked among the Republican candidate, Republican voters, Democratic candidate, and Democratic voters. Republican voters, for example, placed less emphasis on environmental issues than Democratic voters, and Democratic voters placed less emphasis on it than Joe Biden did in his speeches. I found those differences, not only between voters, but also between voters and their leading candidates, particularly compelling.

If that was my focus, then the guiding flow of the chart should be the issue, not the party or the candidate. I looked through a folder of old Five Thirty Eight charts Julia had shown me, and found one that fit the bill: an analysis of 2016 Eurovision results that used solid lines to connect single countries between the new and old ranking systems.

A Five Thirty Eight chart depicting differences in possible outcomes for the 2016 Eurovision competition under two different ranking systems. Two sets of rankings are shown side by side, with dark lines connecting each contestant to its equivalent under the other system. — Julia showed me a *Five Thirty Eight* chart that inspired my next design.

It was a good design, and I liked the idea of representing key issues as single lines that the eye could follow from one side of the chart to the other. I’d simply have to expand it to include four rankings (to account for A&B and X&Y) instead of two. That was doable. I opened my laptop and started sketching late into the night.

From there most of the remaining challenges came from a design perspective. Julia helped me work through several aesthetic issues to craft a clearer and subtler design which ultimately dropped the use of color. Meanwhile, I set myself the challenge of building the final chart as a full HTML document with JavaScript interactivity. It also gave me a good reason to teach myself JavaScript for web development — something I’d been meaning to pursue for a while.

Finally the day came. My final draft was ready.

In my final visualization, both candidates and both groups of voters are given a column dedicated to the emphasis they give to the thirteen key issues. Individual issues are connected across the chart by dark lines that change as the user hovers. At right, each issue is given its own caption that appears as its issue is hovered. — My final draft made use of interactivity to emphasize only one issue at a time for the sake of clarity. I chose the environment as the default focus because I found its vast swings between groups interesting.

The final chart bore almost no resemblance to my original bar chart of associations. The rankings remained, but any sense of magnitude was gone. Instead, the JavaScript running in the background welcomed the reader into the story by highlighting a single dedicated issue. From there, the reader could hover over any issue to highlight its path between interest groups, and my annotations of context for each issue would appear in the sidebar.

The design logic of this final piece had also changed significantly since I began the project. Gone were the bold colors, the axis lines, and the attempts to cram layers of significance and implication into every available axis of meaning. My final piece is, in a way, less informative than my early packed circles charts. But it is also more clear, and, as a result, the reader learns more from it.

And that, I think, is the real lesson I took from my DVS mentorship experience. For years, I felt myself caught between two worlds: unable to reconcile my abilities in computer science and my passion for writing. Now I see, for the first time, that they are not separate skills, but one.

“For years, I felt myself caught between two worlds: unable to reconcile my abilities in computer science and my passion for writing. Now I see, for the first time, that they are not separate skills, but one.”

When George Orwell laid out his famous rules of writing — “Never use a long word where a short one will do,” “If it is possible to cut a word out, always cut it out,” etc — he could have been just as easily describing the principles of efficient computation as the principles of good rhetoric.

Principles of good design and good data science are not unlike the principles of good writing: all boil down to simplicity, elegance, and clarity. Only once I realized that was I able to see past the trappings of more complex charts and visualize the data in a way that would make the insight, and the story, accessible to the reader. And now, finally, I can picture a way forward: a way to be a computer scientist and a writer, and a way to tell stories with data that leverage the skills of both disciplines. That was the real value of my DVS mentorship experience, and I have a feeling it will stay with me for a long time.

Declan Bradley

Programmer, writer, and aspiring data journalist, Declan Bradley is an undergraduate student at Reed College and an editor of the campus paper, the Reed College Quest. His work has been recognized by both the National Scholastic Press Association and the Associated Collegiate Press, and in 2023 he gave a talk on data journalism at the ACP’s spring conference. When not chasing his next story, he can be found lost in the stacks of the campus library contemplating obscure volumes of Spanish literature and Soviet Science Fiction.

CategoriesCareer

Declan Bradley

Related

Early Career Corner: Meet Rosmery Izaguirre

Four Reasons to be Optimistic about Data Journalism in 2024

Data Storytelling Is Empowering Female Journalists in Kenya