Emotions, like a wise compass, provide us with invaluable insights into our inner world. As my therapist often reminds me, they carry information that guides us. We need not be ruled by them, but ironically, ignoring them sentences us to their whims. By acknowledging and naming our emotions, we gain the power to choose our response. Emotions exist, whether we want them or not, playing a dominant role in our human lives.
For someone like me, emotions are as complex as a Gordian knot, but I’m certainly not a psychologist; I am a fellow human who experiences them in the typical human way. I’m learning to identify and embrace them, allowing them to shape my existence. It’s no easy task, but the rewards are immeasurable.
Emotions can be both overwhelming and vital. Without them, life would be devoid of color, a monotonous existence. When I think about emotions, it reminds me of color and temperature. This led me to explore how emotions are represented from the perspective of data science — the perspective I’m most familiar with.
My first question was how do we capture something as complex as emotions? I searched the web in an attempt to find datasets on emotions. I thought that emotion could be detected from facial expression but also from text. I also thought that one data science task particularly shares some similarities with emotion detection: sentiment analysis. Sentiment analysis could be seen as an extremely coarse grain emotion detection task with only three categories: neutral, positive, and negative.
After perusing the popular data science platform Kaggle, I unearthed an intriguing dataset: “Emotions Dataset for NLP” along with an accompanying article detailing its collection. The task was to classify sentences by six emotions: sadness, anger, surprise, fear, love, and joy, much like the adorable world of Pixar’s animation, “Inside Out.” Although these few emotions also seemed very coarse grained, I decided to give it a try and see how they look.
The journey begins
I became eager to “visualize these emotions” and gain a deeper understanding by organizing the data based on multidimensional representations of their corresponding textual descriptions. Thus began my journey across the emotion space.
How does one truly delve into the depths of this dataset? I was curious about the authors’ decision to select only six emotions out of the countless possibilities. Upon loading the data into a DataFrame, I set out to truly grasp what lay within. The dataset presented me with two primary aspects: textual descriptions (coming from tweets) and corresponding labels reflecting emotions. For instance, I stumbled upon an entry stating, “I feel strong and good overall,” labeled as “joy.”
Rather than embarking on a conventional classification task, where I would train a model to predict emotions based on labeled sentences, I sought to explore the dataset itself and its representation. Leveraging a language model with universal sentence embedding, I transformed each sentence into a long vector of numbers, positioning them within a latent space according to the language model. Employing a dimensionality reduction technique, I extracted the three most informative components from the extensive vector and plotted them. Admittedly, it didn’t offer any groundbreaking revelations. Dimensionality reduction techniques are commonly employed to gain insights into datasets, but even with the added labels, I wasn’t convinced that discernible patterns were emerging.
One lovely tool I really enjoy is tensorboard which lets the user visualize the latent space of the model. With its help I loaded the obtained vector-based representation of descriptions (also called “embeddings”) into tensorboard and applied UMAP dimensionality reduction out of the box. You can see it below.
Creating emotion cubes
I assigned labels and hover-over descriptions to each data point. Upon closer inspection, I noticed numerous clusters of points huddled closely together. Intriguingly, instead of running a cluster analysis, I became captivated by these individual data points and their neighborhoods. A thought struck me—if I could encapsulate neighboring points within cubes, it might yield fascinating insights when analyzed cube by cube or at least interesting mixtures of “color-emotions.” I wondered how to cover the neighboring points in cubes. Because I worked in a three-dimensional space, I imagined it as a huge storage room that I would fill with small cardboard boxes. Something like in the picture below. In each box we could place emotions that lie together. Of course there won’t be any gravity force to keep the full boxes on the ground but the idea seemed interesting enough to try it out.
Thus, I divided the latent space into small equally sized cubes and assigned colors to represent the emotions they contained. With a touch of transparency, the cubes came to life — how exhilarating it was. It took me a while to realize that the majority of cubes that filled the space were empty and thus they were covering the emotion cubes, so I decided to remove “empty boxes” and leave only boxes with emotions.
Initially, I contemplated coloring each cube with the most dominant emotion it contained. However, as I immersed myself in the project, I realized that certain emotions are more intricate, often comprising a mix of several emotions. I thought, why don’t we look at the six labels as basic building blocks and see what will happen when we blend the basic emotion colors together? Perhaps this blend could offer us unique insights into the emotional landscape. Consequently, I experimented with blending colors in proportion to the composition of emotions within each cube, settling on translating colors into CIELAB space and mixing the dimensions there.
And there we stood, surrounded by multiple little cubes brimming with colors—anger, sadness, joy, love, and everything in between. Inside these cubes, I placed the corresponding data points, allowing the chart to rotate and facilitating exploration of each cube and its contents. You can access it here.
I searched for other data as well, and came across the GoEmotion dataset. This dataset looks at the more complex emotions, it lists 27 of them and a neutral state, as well. We can see below how mixing the colors by emotions makes each cube like no other.
Going a step further I thought, what if we use these emotion datasets and basic building block emotions as the initialization and we let the data mix the emotion-colors in the cubes and extract the emotionally inspired color palette at the end? See below for the extracted colors.
After thinking about palette color I thought about the sound of emotions; one way to expand upon this project would be to add a layer of sonification to the cubes.
Challenges and reflections
Working on this emotion problem made me wonder about all the different aspects of capturing the data and especially how simplified our models are. I explored individual data points (as “above all show the data” following Edward Tufte) and sometimes wondered how someone labeled them with such emotion. Also, how can you capture such complexity by flattening the emotion to just one sentence, when you cannot hear the voice or perceive the “emotional state” someone was in while uttering the sentence or a sound? I suppose George Box was right again when he said that “all models are wrong, but some are useful” and we should always have that in mind while looking at models.
As an experienced Machine Learning Researcher with over 6 years of R&D expertise, I’ve devoted my career to harnessing the power of data for informed decision-making. Currently, I’m at the forefront of technology at Accenture Labs in Dublin, where my focus lies in Explainable AI and Graph Machine Learning, specifically within the Life Sciences domain. I hold a Bachelor’s degree in Control Engineering and Robotics, complemented by dual Master’s degrees in Data Science and Entrepreneurship, earned through EIT Digital. My journey includes valuable experiences from both industry and academia, which have enabled me to excel in bridging the gap between science and business. But my journey doesn’t end there; I’m also an EMBA Candidate at Trinity College Dublin, steadily expanding my business acumen. Throughout my professional path, while data visualization may not be the primary focus according to my job description, it remains a cornerstone of my work. Its ability to effectively convey complex information is pivotal, significantly enhancing the overall impact of our projects.