I love Twitter. Well, I love my community on Twitter. I love learning about new art, new events, scientific discoveries, and social movements new and old. But when I’m scrolling through my feed, all I see is a certain style of post: engagement-bait. The content I want to see never makes it to my feed.
I’ve curated a pretty good list of those interesting people that I follow. When I go to their individual profiles, I see a wealth of enriching information and creativity around art, AI, political philosophies, new technology, etc. But I rarely see any of that content on my main feed. It all gets drowned out by the sure-fire, dopamine-rush tweets that get thousands of likes and retweets, or the “Main Character” being ratioed to oblivion by pilers on. There’s no way my friends’ deep discussions and beautiful creative posts–with their paltry dozens of likes and maybe a retweet or two–could compete.
Which is exactly what Twitter’s algorithm (“The Algorithm”) is optimized to do. Those kinds of engagement-bait tweets keep us scrolling, keep us engaging, keep our brains trapped in an attention cage.
I began to really get fed up with The Algorithm hiding the content I wanted to see. I felt like I wasn’t ever seeing tweets from people I actually follow, but rather randos that The Algorithm thought would keep me hooked. Many others, it seems, feel the same way. To be sure, I needed to collect data to prove my hypothesis, and so I began this experiment.
A system of self-surveillance
The first step was to record and measure what The Algorithm was shoving into my eyeballs. So I wrote an app to surveil my screen every time I opened the Twitter app.
My app appropriated Android’s Accessibility Service system, which is normally used to give people with disabilities alternate ways to experience an app–like screen readers that read the text to blind people. (This technique has been used in the past to automate tasks, fix broken or unsupported abandonware, even help exploited workers fight back against gig economy corporations trying to surveil and control them. It’s a testament to Android’s open and extensible nature.)
My app scans all the text on my Twitter main feed, parses it into individual Tweets, and then saves it to a database. It does this constantly, every time I scroll. I collected thousands of tweets over dozens of hours, scrolling through Twitter over a month-long period. This way, I gathered the raw data to test my hypothesis.
(Why didn’t I use the Twitter API, you ask? Well, Twitter rejected my request to use their API after I told them my intent to write this article. I guess they don’t want people digging in and revealing how their system is rigged. 🤷♂️)
Chewing the feed data
Once I got the raw data from my app, I parsed it into relevant portions, and then imported it into Observable to start visualizing the data. All my data and code is in this Observable notebook if you want to explore more deeply.
First I came up with a primitive metric I call “total engagement,” which is the sum of the number of likes, retweets, and replies that a tweet gets. The following graphs use this metric as a general way of measuring how many people have interacted with the content (which I’m guessing is close to how The Algorithm decides to show things to you anyway.)
Let’s start by just comparing how many tweets in my feed are from people I follow (hereby labeled as “friend”) versus strangers.
Looks like around one third of my feed is tweets from friends, and two thirds are from strangers and ads. But let’s go a bit deeper into why.
We start to see that the less popular tweets are from people I follow, whereas the really popular tweets with loads of retweets and likes are mainly strangers.
Maybe I just don’t follow enough popular people who tweet out bangers with hundreds of thousands of likes? I’m curious to see how these numbers compare to someone who primarily follows large accounts, and if it’s still an easy one-third split between friend/stranger or if the division reflects popularity.
Even at this log scale, it’s clear that not many of my friends post popular tweets, and thus don’t show up in my feed. Without the log scale, it’s even more dire:
Who is in my feed?
Let’s see how many of the people I follow actually show up in my feed.
I am following over 2,000 people, so to only see tweets from 10 percent of them is disconcerting; 90 percent of the people I intentionally follow, and want to hear from, are being ignored/hidden from me. When we dig deeper, it gets even worse.
Here’s a breakdown of repeated posters, who appear in my feed multiple times.
You can see a good portion of them aren’t even people I follow. And even of the people I do follow, a small percentage of them take up a large amount of my feed.
Here is the same graph, color coded by how popular each tweet is:
It’s interesting to see that some of the top repeat tweeters have content that only has a moderate amount of engagement. I’m guessing this means these people are shown to me more often because I interact with them more often. And because I see them a disproportionate amount of the time, I interact with them more, which only compounds the problem, shrinking my “bubble” through recursive self-selection. To borrow a phrase from machine learning, this could be called “overfitting:” where The Algorithm focuses too much on a narrow band of people because I’m only able to interact with that narrow band of people shown.
Why are they in my feed?
Tweets shown by friends’ recommendations:
It looks like the majority of tweets are from strangers recommended to me because someone I follow liked or retweeted them, and the remaining 13 percent are ads:
Then there’s a whole 11 percent of tweets that I cannot classify with my parser (labeled “Stranger”). These are probably “viral” tweets or tweets on topics that Twitter thinks I would like, even though they have no connection to me or people I follow.
We also see that a large portion of strangers’ tweets appear in my feed because they were recommended by the same small subset of people I follow. I call these “The Tastemakers” since apparently these few people get to dictate what I see in my feed.
How current is this content?
Analyzing the relative time posted, most of the tweets from people I follow are more recent (within a day). Whereas if it’s older than a day, it’s mostly from strangers. I guess Twitter wants me to catch up on the drama happening in other areas I’m not actively following. (Or, maybe it takes longer for the strangers’ tweets to get enough engagement to propagate to my feed).
Unfortunately, this means that the majority of tweets I see are relatively old and outdated. Conversely, I’ll miss any of my friends who post infrequently if they posted more than a day since I last logged in (with my level of Twitter addiction though, that’s not likely, haha).
What about tweet quality (sentiment analysis)?
I then tried some sentiment analysis using VADER, to see if The Algorithm was feeding me overly negative or positive content.
Interestingly, it seems to be pretty evenly distributed, maybe a little heavier on the positive side. I wonder if the Twitter algorithm takes sentiment into account when deciding what to show you, or if it’s just a natural distribution of popular tweets.
Who is today’s Main Character?
Then, I visualized the “Ratio”, i.e., the proportion of likes to replies, in order to spot controversial statements. Anything below the zero line has more replies than likes, which could be an indicator that it has fired people up enough to respond. (Hover over any dot below to read the tweet text.)
It seems like The Algorithm doesn’t optimize for showing me the deeply controversial tweets, but a few do sneak in there. Judging by the amount of orange below the line, apparently I follow some real rabble-rousers.
How do we fight The Algorithm?
So how do I interact with those 90 percent of people I chose to follow but never see? How do we win back our attention from the clutches of The Algorithm?
The short-term workaround I found was to create Lists of people I follow, divided up by subject/category: AI/ML, dataviz, UX designers, philosophy, artists, etc. Then I add those Lists as Tabs at the top of the screen. Those tabs seem to be purely chronological, bypassing The Algorithm and culling a lot of the cruft from people I don’t follow. I end up seeing content from people who never show up in my main feed. Of course, this only works in the Twitter app, and only for now.
To solve this issue longer term, and for other social networks governed by similar attention-capturing Algorithms, we need to rethink how these platforms are designed and built. Stephen Wolfram proposed to let people choose their own Algorithm: the social platforms could still host your data and provide the interface, but the way the data is aggregated, sorted, and displayed to you would be customized by pluggable algorithms. If any entity were allowed to create and share an algorithm, they could create an entire marketplace of algorithms, allowing for competition and choice in how you consume your content.
What’s ironic is that I think Twitter’s CEO, Jack Dorsey, realized this, and split off part of Twitter’s brainpower to create a new social network called Blue Sky, which encompasses some of these ideas in a novel type of decentralized social network.
The other interesting thing emerging from web3 and decentralization is a move away from these types of algorithms that incentivize inflammatory race-to-the-bottom-style content creation, towards a model that supports creators and communities through more socialized funding and discovery. Some emerging coalitions like Channel are exploring new modes of content creation, distribution, and ownership, through NFT subscriptions, patron-models, RSS, etc.
Another approach, proposed by The Center for Humane Technology, is from the centralized, regulatory side: leverage 12 pressure points to change social network incentives–from internal design changes and oversight boards, to lawsuits and regulations that shift power towards the users of the networks instead of the shareholders of the company.
The way I see it, the centralized path via government regulation is a short-term fix which may be necessary given the amount of power our current societal structures allot to social media corporations, but the long-term fix is to put the power into the hands of each user instead—especially considering that centralized power structures are how we got into this mess in the first place. I’m eager to see what this new world of decentralization will bring us, and how it could afford us more agency in how we donate our attention and how we manage our privacy.
At the very least, maybe I’ll finally be able to see what my friends are posting.