This article originally appeared in Issue 1 of Nightingale Magazine. Purchase a copy, while supplies last.
The Challenge: We’re a community of data visualizers, so let’s do what we do best— visualize data! Explore the selected dataset, find an interesting angle or insight, and create a visualization using the tool of your choice. Infographics, data stories, data art… you have permission to get creative.
Submission: There are no prizes here—this is simply an opportunity to practice our craft—but we would love to see what you create! Send a high resolution image (at least 300 dpi) of your visualization (or an excerpt from it) along with a brief description of no more than 100 words to nightingale@datavisualizationsociety.org by SEPTEMBER 16, 2022. Submissions will be featured on NightingaleDVS.com or in future issues of Nightingale Magazine.
About the dataset, from Jeremy Singer-Vine:
Jeremy Singer-Vine is the publisher of Data Is Plural, a weekly newsletter that highlights useful and interesting datasets, and, until recently, the longtime data editor for BuzzFeed News.
I’ve chosen the London Stage Database as the source for Nightingale’s inaugural dataset challenge. I first encountered the dataset a couple of years ago and it continues to impress me. The material itself is compelling, of course: it describes 100,000+ performances at 50,000+ theatrical events in London from 1660 to 1880, often supplemented with detailed notes and cast lists. But the project, led by the University of Oregon’s Mattie Burkert, is also a great example of making data usable for others.
At a technical level, it’s easy to use: You can search and explore the data online, or download it in a variety of widely-used formats, including CSV and JSON. Its documentation also goes the extra mile. The “user guide” defines the key terms, provides interpretative context, and highlights a few caveats.
And the “about” page describes the project’s (fascinating) provenance, recounting Burkert’s and her team’s efforts to recover a damaged, nearly-forgotten database from the 1970s, itself derived from an 8,000-page series of reference books, itself ultimately drawn from centuries-old playbills, newspaper notices, theater reviews, and other primary sources.
As the documentation foreshadows, the dataset is slightly messy, especially in the transcriptions of the original text. But don’t let that discourage you; it’s a great opportunity to practice your data-cleaning and quality-checking skills. And, personally, I’d trade a clean-looking, documentation-less dataset for a messy, well-documented one any day. As the project’s maintainers write, “We hope that visitors to the site will find this frank acknowledgment and foregrounding of the dataset’s history and limitations refreshing rather than frustrating.” I couldn’t agree more.