Lessons Learned From Installing a Data Physicalization

In June 2023, Pratt Institute hosted the HASTAC 2023 Conference on the theme of “Critical Making and Social Justice.” This three-day conference included traditional academic papers and presentations, as well as a full-scale exhibition of creative work. (Pratt News has an overview of the conference.)

As part of the conference planning team, we, Claudia Berger and Chris Alen Sula, wanted to produce a visualization of the conference work itself, while also reflecting the themes of the event in our work. Claudia is the Digital Humanities Librarian at Sarah Lawrence college and a Visiting Assistant Professor at Pratt Institute’s School of Information teaching digital humanities. Their research centers on how crafts can be integrated into digital humanities projects. Chris is Associate Provost for Academic Affairs at Pratt Institute and Associate Professor in the School of Information. His research is on digital humanities, information visualization, and the ethics of data/technology. He also was the conference chair for HASTAC 2023. As our responsibilities planning the conference did not give us time to present our own work, we wanted to find a way to take part in the content of the conference through our interest in data visualizations.

Data physicalization felt like an appropriate method because it built off the work of Shubhangi Singh, a graduate student in the MS Program in Information Experience Design who made the logo for HASTAC. Her design invoked networks in the logo, which was then recreated in yarn at the conference. We decided to pair that installation with a network of session data reflecting keywords and tags submitted by presenters as part of their proposals to the conference. While neither of us had created an installation like this before, we knew it was going to be a lot of work, but we certainly learned firsthand just how much harder it was than we had anticipated.

This article explores our process of making the physical data viz, from initial conceptualization through installation. While some of the challenges we discuss here are particular to our network, others likely apply to other data physicalization projects. Perhaps most surprising is that we did not even finish the installation as planned, and yet the result still accomplished many of the goals we set for the piece. We hope some of the lessons we learned will be helpful to other folks looking to undertake similarly-scaled data physicalization projects in the future.

A series of nodes connected by green string. “Arts,” “design,” “community,” “participatory methods,” and “digital humanities” are the largest nodes, easily readable at a distance. — Final version of the network installation. Photo by Claudia Berger.

Concept

Our concept for “Crafting Connections: Creating a Network of HASTAC 2023” is reflected in the following statement, which was displayed alongside the network diagram at the conference:

Networks express connections, associations, and communities. Here, topics of the conference (via author-selected and author-generated keywords) have been carefully detangled and placed into conversion with each other and made physical in form. This social object can be viewed and interacted together with others, making the conference community tangible.

Each thread represents one work in the conference, its color reflecting format: papers, panels, and roundtables (green); workshops (orange); exhibitions (red); performances (purple); activities (blue); and multiple formats braided with their respective constituent colors. This design reflects some of our earliest thinking of the conference, which is also embodied in the HASTAC 2023 logo.

We invite you to identify and locate different work from the conference in this social object and to trace its lines in connection with others.

Data and Network Design

We used ConfTool, a conference and event management software, to manage submissions to the conference. Authors were asked to provide keywords for their abstracts and to select from a list of academic disciplines that best described their work, which allowed us to download a CSV file of all of the metadata for the accepted submissions. Using OpenRefine, we merged and normalized keywords for accepted submissions (e.g., combining “map,” “maps,” and “mapping” into a single term). We were able to whittle down the list of 900+ tags/keywords to 347, which created more meaningful groupings even if it flattened some nuance in how those terms were used.

Simultaneously, we had to decide how our network would be physically constructed: would each session be a node with the keywords serving as the edges that connected them, or vice versa? We took a small sample of data—four sessions that shared four keywords—and oriented the network in both ways. In order to make a quick mock-up, we used the three-digit ID numbers ConfTool assigned to each session in place of the full session name.

Two diagrams, each with four circles connected by lines. On the left, the circles read “102,” 107,” “117,” and “150,” connected variously by lines labeled “Activism,” “Arts,” “Design,” and “Participatory Methods.” On the right, the circles read “Activism,” “Arts,” “Design,” and “Participatory Methods,” connected variously by lines labeled “102,” 107,” “117,” and “150.” — Two approaches to the network layout. On the left, session IDs, as assigned by ConfTool, are connected by lines representing shared keywords. On the right, keywords are connected by lines representing sessions associated with them.

Ultimately, we decided to have the keywords serve as nodes. It felt like viewers might recognize topics faster, and we liked how quickly this network represented the scope of the conference. It showed how interdisciplinary the work was, and making each session an edge reinforced the metaphor of conference work as holding the network—and our community—together.

Based on this approach, we processed the session data using an R script to pair each keyword with other keywords for each session, creating an edgelist for the sessions as a whole. Using Gephi, we generated a network layout and filtered out nodes with fewer connections (i.e., lower degree), ultimately landing on 87 keywords in our network. This number balanced detail with legibility and would still be feasible for us to construct. We re-processed this data several times as session data changed based on conference registrations and withdrawals.

A network diagram of the keywords. The largest nodes read “arts” in pink; “participatory methods,” “community,” and “activism” in orange; “digital humanities,” “data,” and “media studies in light green; “education” and “pedagogy” in dark green; and “design” and “design-based approaches” in blue. — Keyword network layout, filtered to the 87 most used keywords.

In addition to layout, we also needed to decide how color would work in this network. We knew that we wanted to use the colors from the visual identity of the conference, but how the colors would function in the network was not yet clear. In the sample networks we generated, color was used to highlight clusters of nodes based on related topics. There were many other options, and we weren’t sure how we wanted to proceed: color could represent location of where the presenters were from, or presentation themes, field, or format, etc. Eventually, in order to further represent the range of work and format at the conference, we landed on using colors to represent the different types of sessions (paper/panel/roundtables, workshops, exhibits, performances, and activities), leaving the nodes as black text on white background to enhance readability. Later, in setting up the conference program in Sched, we used this same color coding on the calendar of events to help attendees differentiate the different offerings.

Install

Our installation was located on a set of bulletin board walls in the campus’s Student Union. This backdrop limited what materials we could use to create the network. We wouldn’t be able to install hooks or nails that other data physicalizations have used. Instead, we glued pushpins to the nodes, which were printed on bristol board and cut before being attached to the wall.

Working space with a glue gun, clear push pins, an upside down node with a few pushpins attached, and a stack of completed nodes. — Gluing pushpins to the back of printed nodes. Photo by Claudia Berger.

At this point, we noticed that the proportions of our digital layout did not match the width of the wall available to us. If we had kept the original square layout, some nodes would be too high and too low to work with, or for viewers to see and interact with. To adjust the layout, we moved some of the more central nodes on the bottom and top of the network and also moved other nodes horizontally to spread them out across the space.

Nodes attached to the wall with a few strings installed between them. — Starting installation of the yarn. Photo by Claudia Berger.

Instead of making a true keyword network (connecting keywords each time they co-occur in a single session), we used one piece of yarn for each session and created more of a circuit, traveling to a particular keyword each time it was used by an author. Initially, we followed the order of the tags as entered by the authors—preserving any importance the authors may have felt about early tags—but this sometimes required crossing the entirety of the installation multiple times if nodes were on opposite ends. Early in the process, we pivoted to rearranging the order of the tags to create a more optimized route, keeping the author’s first tag intact in case it was some sort of primary or important label. While it took a little time to go through the lists, this made installation easier, faster, and used less yarn.

To make the installation interactive and explorable for conference attendees, we added a tail to the start or end of each circuit, to which we attached a label with the session title. This design allowed participants to navigate the network and find related sessions, as well as giving visual weight to keywords that were more frequently the first keyword used by an author. Nodes with multiple labels hanging from them stood out from other nodes that were of equal or similar size. Whether or not the authors intended for the first keyword to be the most central one to their work, it highlighted patterns; for instance, when “digital humanities” was one of the listed keywords, it was usually the first one.

Close up of a single term, “digital humanities” with around a dozen labels hanging from it. — Labels draw attention to the many sessions associated with “digital humanities.” Photo by Claudia Berger.

Lessons Learned

If we were to do this style of data physicalization again, there are a few things that we would do differently.

Layout

Even with the changes we made to the layout, some of the nodes were too high to work on comfortably. It was hard to firmly secure them to the wall, and some fell off as we tried to install the yarn. Even the tallest member of our team had difficulty reattaching them securely.

There are a few ways this could be addressed in future installations. First, when designing the initial network, we could ensure that the digital layout proportions match the actual space being used. That way, fewer changes would be necessary. Second, when we attached the nodes, we started with the center of the network (arts, community, design, and participatory methods) and then worked our way out. This pushed the periphery too far out and created a lot of empty space in the center of the network. We could’ve had those four nodes closer together, but we didn’t notice that until much later. To counteract this, we could start by placing outer nodes first and work inwards. This would ensure the upper and lower bounds were reachable and legible while minimizing the wasted space in the center. While there is a risk that this would cause the center of the visualization to be cramped, it prioritizes making the installation physically less taxing on those working on the project.

Construction

In addition to our difficulty reaching the nodes, it was sometimes hard to attach the yarn to them. We started by wrapping the yarn around an individual pushpin on a node, but quickly found that some pushpins were too close together to wrap them effectively. After a time, we started wrapping the yarn around the circumference of the node, which took more yarn and sometimes led to awkward connections between nodes. Our process could be improved by using fewer pushpins, creating more space between each pin. Or, instead of gluing the pins around the outer edge of the node, we could move them closer in, making it a smaller circle of pins closer to the center of the node, making it easier to wrap the yarn around the entire node. Both methods would also make it easier to attach the node to the wall, as there would be fewer pins to line up and push in.

Time versus labor

The biggest thing that we would change is the installation of the piece itself. Part of the reason we were unable to finish the work was because it was our first time making a physicalization project, and we didn’t budget enough time for installation, along with our other conference responsibilities. In the future, we would start earlier to allow more working time.

More workers wouldn’t necessarily speed up this project. Having even two people attach yarn at the same time created maypole-esque tangles, and installation worked best when one person read off the order of the nodes and the other attached the yarn. Still, the installation took a physical toll—reaching, stretching, manipulating yarn with fine detail—and having multiple workers in shifts might lessen the burden on individual bodies.

Another way to address installation, now that we have a sense of the labor that goes into it, might be to simplify the design of the network as a whole. We could’ve used a random sample of sessions instead of trying to represent every single one. Or, recognizing that we would not finish all of the sessions, we could have cycled through colors so that the overall look we wanted could be achieved, even if the data was left incomplete. By striving for partial completeness representing all of the papers/panels/roundtables in a single color, we ultimately produced a flatter visualization than we had intended.

Unfinished, and yet…

This project was about representing the community of the conference, and in the end it both represented and helped foster community at the conferences. Attendees explored the installation together as it was located in a key area of the event that served both as registration and the location of food and coffee throughout the three days. This was a place where people gathered and there were couches nearby where people could rest between sessions. While the plan had been to have the installation completed before the conference began, installing it during the first day of sessions actually helped to encourage conversation about the piece. We were present to answer questions and talk about it while it was under construction, and attendees enjoyed seeing the process and checking in over the course of the day on how the work progressed.

People standing and seated around the Student Union. Claudia is assembling the network in the background while people watch and point to it. — Assembling the network during the first afternoon of the conference. Photo by Favour Ritaro.

While the final result was not a technically accurate network of the data, that wasn’t the goal of the piece. We were using data physicalization and introducing art into the process of our network to help us tell a story about the data. Some of the “inaccuracies” like having to move the nodes helped emphasize the human element of literally having our hands on the data, and in some ways the installation turned into a performance of the labor of datawork itself.

Claudia Berger

Claudia Berger is the Digital Humanities Librarian at Sarah Lawrence College and Visiting Assistant Professor at Pratt Institute teaching digital humanities. Their research centers around new approaches to digital humanities research, like physical data visualizations, such as quilts as data visualizations, and digital environmental humanities. She also serves as an Editor of dh+lib working on the biweekly dh+lib Review and editing special issues.

Chris Alen Sula

Chris Alen Sula is Associate Provost for Academic Affairs at Pratt Institute, with primary responsibilities for curriculum, assessment, and accreditation. He is also tenured Associate Professor in the School of Information, where he founded the MS Program in Data Analytics & Visualization and Advanced Certificate in Digital Humanities. His research publications focus on digital humanities, information visualization, and the ethics of data/technology.

Chris is co-editor of Lateral, the peer-reviewed open access journal of the Cultural Studies Association; the edited volume Cultural Studies in the Interregnum (Temple University Press, forthcoming); and the open-access series Emergent Ideas: Lateral Books in Cultural Studies (Amherst College Press). He served as president of Pratt’s Academic Senate from 2016–2022 and as chair of the HASTAC 2023 Conference “Critical Making & Social Justice” hosted at Pratt.