Step 7 in the Data Exploration Journey: Spin-Off Projects

This article is part 8 in a series on data exploration, and the common struggles that we all face when trying to learn something new. A list of previous entries can be found at the end of the article. I began this series while serving as the Director of Education for the Data Visualization Society, because so many people were asking to hear more about the process of data exploration and analysis. What began as an exploratory project on the “State of the Industry Survey” data grew into a 1.5-year Career Portraits project that produced the 2023 “Career Paths in Data Visualization” report (DVS member login required). This series illustrates how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. Let’s see where this journey takes us!

In the last article, Jenn Schilling and I refocused my initial data exploration to frame a broader Career Portraits project based on the DVS “State of the Industry Survey” data. We trimmed our scope aggressively to reflect the time and resources that we had on hand, and re-envisioned some core parts of the project. At the end of that focus-and-consolidate phase, we had a clear, tight focus for the project.

Successfully navigating the focus phase has many advantages. First, it allows you to put your energy into the most important things. It also creates the seeds for lots of new ideas and projects that can spin off of the core work; very often, almost everything you cut can be considered a future upgrade or a new project in its own right.

If your time and resources are fixed, focusing a project can mean postponing parts that you’re excited about. (I suspect that this is why most people struggle to make the cuts.) In many cases, this also creates an opportunity to share the work or to structure it in new ways. The end of the focus phase is the perfect time to start looking for collaborations that can help to move your project ahead. The guidelines from our previous article on collaboration still apply! In the case of spin-off projects, it’s particularly important to remember that a collaboration adds back to the scope that you just reduced. You need to account for that effort, and should never use spin-offs simply to avoid making cuts.

When working on initiative-level collaboration for something as large as the Career Portraits, it’s important to:

Set clear boundaries between projects. Everyone needs to know what they’re working on, who’s doing what, how it’s different from what others are doing, and what’s needed for it all to come together. Mixed messages lead to missed goals, duplicated work and frustration.
Look for common goals and mutual wins. The best collaborators have an interest that is slightly outside of your scope, but whose needs align with yours. We’d worked on aspects of both projects below during our initial Career Portraits work, but pursuing them as collaborations generated significant contributions that supported and extended the core work beyond what we could have done alone.
Work to align timelines in advance. In some cases, collaboration like this creates dependencies. It’s important to be clear about when you need things done, and to be willing to flex if the schedule doesn’t work out as you hoped.
Be realistic about what you can take on. Collaborations take a lot of work and require support to succeed. Collaboration is not delegation, and you need to be available to fully participate in any project that you spin off. If you can’t realistically support it from start to finish, don’t start.

As Jenn and I started the heads-down build phase for the core Career Portraits work, I was able to identify and spin off two projects in collaboration with other teams:

Collaboration #1: Career Interviews Series

The first project that we spun off was a series of career interviews with people working in data viz. I knew from the beginning that I wanted to include qualitative stories alongside the quantitative data for the Career Portraits, to give the data a more human face and to illustrate how much variation there is within even a few of the individual “data points” (a.k.a. responses) from the survey. It’s always important to check your insights against reality whenever you are building a data story, and connecting with people from the community was one way to help us do that. While Jenn began re-working the core data analysis in December 2021, I started a series of research interviews with people working in the field. I put out a call for participants in the newsletter, on Slack and in a couple of articles, and we got a core set of interviews scheduled in January to start the research.

YouTube video cover with a title and photo of Elijah Meeks — Caption: Cover page for the Careers in Data Visualization interview series, hosted by Elijah Meeks

Around March of 2022, Amanda Makulec started conversations with Elijah Meeks about hosting a series of career conversations to spotlight paths into data viz. This aligned well with the work that I was already doing for the Career Portraits project, so we joined efforts and I worked with Elijah to brainstorm some questions and visualizations to inform his series. Amanda organized the calls to be released over the summer, and Josephine Dru and I compiled transcripts for each one as they came out. My early research calls had given us a good sense of where we wanted the project to go, so I was also able to compile a pre-survey for all participants to take. The questions were similar to the ones we were working on in the Career Portraits project, but they went into more detail and depth on a few points that we wanted to learn more about. Because we were working in a short format with a patient and supportive audience, we were able to ask much more focused (and sometimes repetitive) questions than we would publish in the general survey. As the results came in, I visualized the data and wrote a series of summary profiles for each interview, creating the “Career Profiles” section of the final report.

Adding profile interviews to the newly-reduced scope of the “Career Paths in Data Visualization” report more than doubled the work and required pushing our initial deadline back by a couple of months, but it gave the final project a much richer view into career paths in data viz. If we hadn’t cut deeply during the focus phase for the core project, we wouldn’t have had the time budget to take advantage of this opportunity when it arose. Collaborating on the profile interviews made both projects stronger, and made the lift much smaller than if the Career Portraits team had tried to do it all alone.

Collaboration #2: Automated Tagging of Free-Text Job Titles

The second project we spun off was much larger in scope. Jenn had completed a quick clustering analysis when she first joined the Career Portraits project to look for trends in the kinds of tools needed across different careers. By looking at where specific titles intersected or crossed over between career areas, we thought we’d be able to pull out a lot more detail about specific roles. Our early results were very intriguing, but we quickly realized that this analysis was a project all on its own, and it wasn’t realistic to pursue it as part of the core Career Portraits work. Quite reluctantly, we put the clustering work aside so that Jenn and I could focus on re-running her initial core careers analysis on the updated 2021 dataset, so that we were using data from the most recent survey collection year.

Graph diagram showing 7 color-coded clusters with connected nodes. — Clustering analysis from Lukas’ thesis project

In April of 2022, a graduate student named Lukas Geisseler posted a query in the DVS Slack looking for organizational partners for his master’s thesis in Applied Information and Data Science at Lucerne University in Switzerland. His program required a project with a well-defined topic that would contribute to an organization. I reached out in response, and we started discussing whether he might be able to use the survey dataset to support his thesis project. The clustering analysis was where our discussions began, but we soon settled on a much larger task that would be a fantastic addition to our dataset and a better candidate for the depth and scope required of a master’s thesis. To understand the importance of his contribution, it’s helpful to know a little bit more about the limitations of the survey dataset.

There are two columns related to careers in the survey dataset. The first one is a fixed career category that users select from a dropdown (analyst, designer, etc.), and the second is a free-text entry field for job title, where people input their actual job title. When we first started the Career Portraits project, I manually (and inconsistently) tagged a couple thousand survey responses to compare the fixed bins to the job titles, and found many interesting threads to pursue. My exploration was fast and dirty, but it gave me a better sense of the dataset, and it pointed out some important aspects of the data that helped us to contextualize the results for our core Career Portraits work.

In the fixed career category question, people often categorize themselves differently than you’d expect based on their job title: someone categorizes themselves as an analyst in the fixed careers list but their title is data visualization developer, or an engineer has the title of data visualization UI designer. This in itself is a fascinating statement about job searching and careers in general, but it’s particularly relevant in a career where roles tend to be highly variable between companies, and are often conglomerates of multiple roles and responsibilities.

The definition of the fixed categories themselves also posed some challenges. First, it wasn’t clear when someone should switch from calling themselves an “analyst” to “leadership”: if your title is director of analysis, where do you put yourself? Some people listed themselves as “leadership” when they got to a team lead or director level position, and some were still listing themselves as “analysts” when their title stated VP or CEO. Second, it could be hard to understand the definition of some of the buckets: Do data scientists belong in the analyst, developer, or scientist bucket? Representatives show up in all three! We knew that these response variations would muddy our results (for instance, including a VP’s salary will almost certainly skew the median salary for a career), but there wasn’t a good way for us to consistently and efficiently tag the titles with the resources and the project team that we had. Using the free-text question could have helped to clarify some of these more complex cases, but we reluctantly chose to rely only on the fixed buckets for our project because they were the simplest to use and the most direct reflection of the user’s input.

The free-text job title question also contains a lot of implicit information about job seniority, role progression, and experience. It would be very interesting to compare job seniority level by title (junior analyst, senior analyst, director of analysis, etc.) with years of experience in the base survey dataset. For example, it would be interesting to compare the range of years of professional experience typical for a junior vs a senior role. Unfortunately, we couldn’t easily extract the job level information from the free-text entries without more advanced methods. In order to focus on our core deliverable, we made several painful cuts and put these more nuanced analyses aside, hoping to come back to them another day.

We didn’t know it at the time, but we didn’t have long to wait. Auto-tagging free text responses was exactly the kind of problem that Lukas was interested in studying, and he developed a pipeline to analyze the job titles as part of his thesis project. The full pipeline includes creation of neural networks, machine learning to train the algorithm, and graph representation to help interpret and quality-check the dataset.

Once built, this pipeline automated the analysis of the job titles data, removing a tedious and manual task that is hard to do consistently over large datasets. However, the initial process of training the algorithm required that Lukas manually validate his machine learning tags. He published an early version of his results in the DVS Slack with a call for participation, and some dedicated community members helped him to error-check and validate the initial coding results, lightening the load on one of the more time-consuming parts of his thesis work. This validation process also helped Lukas to fine tune his approach, and made the final outputs more robust. Once the initial text tagging was validated, he carried out a clustering analysis to look at how jobs were distributed within the results, and used the resulting graph to detect patterns and intriguing details in the dataset.

Lukas’ thesis was submitted in June of 2023. In all, he contributed more than 500 hours of advanced data science and programming time to the DVS. This is far more than we’d expect from a typical volunteer, but he was able to benefit from our dataset, experience and use case as a core part of his thesis, and we certainly benefited from the results of this in-depth academic project. Lukas has already tagged the 2020 and 2021 datasets, and the outputs of his model can be used to tag future datasets as they are collected. Leveraging the algorithm’s consistency and speed will also allow us to enrich datasets from previous years. This enrichment might support more advanced longitudinal analyses in future, if we can overcome the complexities of changing definitions and survey context from year to year. Lukas’ analysis also created a more robust version of the initial clustering that Jenn had worked on in the core project. Producing similar results with a completely different method added confidence to some trends, and highlighted some potential differences or artifacts in others. Comparing the initial results from the two analyses helped us to identify many interesting stories within the dataset, and to outline potential next steps to pursue.

What makes a spin off project work?

There were several ingredients working together that made these spin-off projects successful. Here are some things to look for when evaluating a side project:

Opportunities to provide value to both sides

For Elijah’s interview series, we were able to help brainstorm questions and prep materials, and we folded the results of the interviews back into our work to give them additional impact. When the portraits were published, the interviewees got a document that they can share to highlight their work. We got support from Amanda in scheduling and running the calls, the benefit of Elijah’s standing in the community and his perspective on what’s interesting to talk about, additional material to support the Career Portraits effort, and a collection of generous insights from the people that he interviewed.

In Lukas’ thesis project, we were able to provide an interesting dataset and a tangible problem, as well as some basic explorations to accelerate the start of his work. It’s nice to work on a thesis project that will continue to be valuable after the work itself is complete, and a tangible outcome can help to make an academic thesis more approachable to potential employers. We got almost a year of focused, highly specialized work that enriched our dataset and will help us to continue creating value for the DVS community.

Clear separation between projects

While both projects contributed directly to the Career Portraits work, neither was required for it to succeed. We wanted the projects to feed and encourage one another, but not to introduce unnecessary pressure or risks. The projects were structured in a way that allowed others to take ownership and carry an initiative forward without a lot of input from the core Career Portraits team, but we also set up regular communication between the projects to learn from one another, seek additional opportunities for alignment between projects, and to highlight the contribution and impact of each team.

A critical aspect for planning was to remove or reduce timeline dependencies between projects, so that if one project fell behind or changed direction it wouldn’t break the others. We did need to complete the interview series before the report could be published, but the profiles were written and visualized independently from the analysis work that I was doing with Jenn. I handed off a completed document at the end of my board tenure in January of 2023, ensuring that the core project reached completion before it changed hands. We didn’t include Lukas’ results in the core document because we knew that his project wouldn’t finish until at least six months later, but his results will help to extend and inform the clustering analyses that we’d started before he joined. Having seen his preliminary results, his work could even become the seed for Career Portraits V2, if the DVS decides to pursue that project in future.

Honor each contribution

Each collaboration is a significant commitment of effort and time. It’s important to honor each person’s contribution to the bigger effort, and to take the time to make sure that work is rewarded. There is a difference between collaboration (where both sides contribute) and delegation (where one side assigns work to someone else and expects an outcome). I find that people often confuse the two. To be a good collaborator is to commit to doing work to raise your collaborators up, even if it’s outside of the focus of your core project. For this reason, you need to be very careful about assessing your own ability to support the work involved in a side collaboration. The focus stage of a project is necessary to evaluate whether you have the time budget to do that successfully.

In the profile interviews series, our commitment took the shape of additional preparation for the interviews and nearly doubling the scope to the Career Portrait deliverable. For the thesis work, I chose to continue working with Lukas past my tenure as DVS Education Director, to make sure that he had the support he needed to see his project through to the end of his thesis work. Collaboration is a giving economy, and it’s important that you commit to your collaborators as deeply as you ask them to contribute to you.

In the end, both of these collaborations were highly successful, and I believe that they created significant value to the organization. Both leveraged the early groundwork that Jenn and I had done, but each project took things to a completely different level and contributed far beyond what we were able to do on our own. Because we were disciplined about the focus phase of our project, we were able to identify and act on these opportunities when they came up, allowing us to collaborate in ways that we couldn’t have anticipated at the time we made the cuts.

You won’t always be able to spin projects off immediately with these kinds of results, but a disciplined, focused approach to project management helps to ensure that you’ll be ready to jump on opportunities when they come. Returning later to the things you cut means that you’ll always have another project ready if your initial inspiration runs dry, or your project hits a wall.

The core Career Portraits project was published this summer in the DVS member space (member login required). We’ll continue discussing the actual project build in the next article!

Previous articles in this series:

Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop
Step 5 in the Data Exploration Journey: Collaborate to Accelerate
Step 6 in the Data Exploration Journey: Cut to Realistic Scope

Related links:

Early Sketches for Career Portraits in Data Visualization, by Jenn Schilling
DVS Careers in Data Visualization, YouTube Playlist for interview series by Amanda Makulec and Elijah Meeks
Career Portraits project (DVS Member space login required)

Erica Gunn

Website

Erica Gunn is a data visualization designer at one of the largest clinical trial data companies in the world. She creates information ecosystems that help clients to understand their data better and to access it in more intuitive and useful ways. She received her MFA in information design from Northeastern University in 2017. In a previous life, Erica was a research scientist and college chemistry professor. You can connect with her on Twitter @EricaGunn.