The Academy Awards, also known as the Oscars, have been one of the most prestigious awards in the film industry for nearly a century, acknowledging both individual and movie-level achievements, from Best Picture to Best Director, Best Documentary Feature, or Best Sound. While nowadays, there are more than twenty Oscar categories, probably the biggest names are actresses and actors, who also appear to be strongly embedded in a network of stars. To capture how these potential network links could affect their Oscar success, I explored historical Oscar data covering the past 50 years, examining award winners’ biggest moments of success, and investigating their collaboration patterns.
First, I collected the list of award nominees and winners since 1973 from the Internet Movie Database (IMDb) focusing on the following categories: Best Performance by an Actor in a Leading Role, Best Performance by an Actress in a Leading Role, Best Performance by an Actor in a Supporting Role, and Best Performance by an Actress in a Supporting Role. This way I arrived at a dataset of 1,200 nominations, 200 Oscar statues, and 548 artists.
Later, I collected the individual filmographies of each of these artists using the IMDbPY package in Python which resulted in a set of nearly 40,000 titles. After filtering out generic titles and ‘best-of’ collections such as The Oscars, The Orange British Academy Film Awards, The EE British Academy Film Awards, or The 76th Annual Academy Awards, I arrived at a filmography dataset of about 35,000 titles.
Once I comprised my dataset of more than 500 artists and 35,000 titles, I started exploring the Oscar landscape to identify key figures and surprising facts. For instance, if I organize the individuals based on the number of nominations and wins they have (Figure 1), Meryl Streep strikingly dominates the Oscars with her 21 nominations, which earned her three awards. Jack Nicholson managed to reach the same heights with three awards, but out of only ten nominations since 1973. Moreover, Daniel Day-Lewis and Frances McDormand won three Oscars alike, with six nominations each for their individual work. Additionally, McDormand, who contributed to Nomadland (2020), also earned a Best Motion Picture Award (which is excluded from the current analysis as it is not an individual award).
The dataset contains a few clean double winners, who were each nominated twice and took home a statue on both occasions: Christoph Waltz, Hilary Swank, Kevin Spacey, and Mahershala Ali. Yet, winning multiple awards is not at all typical. In fact, the 200 Oscars are distributed between 174 artists, and only 22 of them have multiple statues. This also means that 68 percent of the nominees have never won. However, some were not as successful as Streep or Nicholson. For instance, Glenn Close has been nominated eight times, while Amy Adams approached the red carpet six times, yet they both went home empty-handed each time. On the other hand, Al Pacino had nine nominations, including The Godfather and The Irishman, and finally won one award. Leo DiCaprio also earned six nominations and finally a single Oscar in 2015. A detailed chart of the top actors and actresses based on their number of nominations and wins appears in Figure 1.
While the landscape of the individual awards certainly tells a story, one would suspect that collaborations, connections, and networking in general, could have an effect on who gets awarded and who does not, especially since the awards are determined by the roughly seven thousand members of the Academy of Motion Picture Arts and Sciences. Additionally, earlier research pinpointed several dimensions of network effects on success in film, so I decided to build and investigate the winners’ and nominees’ social networks.
In this social network, every node represents an Oscar nominee or winner (the 548 artists), where two are linked if they both appear in the same movies’ casts (they have overlapping filmographies). The more often this happens, the stronger their connection. After processing all the movies and filmographies, I arrived at a densely interconnected network of 546 nodes and 17,140 links, which signaled that further data cleaning was needed. Indeed, despite filtering out several titles such as The 76th Annual Academy Awards, there were still movies in the dataset which were generic collections rather than actual novel pieces. These collections typically featured a larger number of individuals, hence did not represent significant collaborations, therefore should be discarded from the network. So instead of further cleaning and manipulating the original data source, I decided to conduct this data cleaning step on the level of the network itself. I did this by filtering out statistically insignificant edges while keeping as many of the nodes as possible, to minimize information loss. For this, I used a previously published method on how to clean noisy networks. This cleaning step left me with a graph of 526 connected nodes and 1,299 links visualized in Figure 2.
The first and most striking feature of this network is the lack of clear communities (unlike having well-defined and interpreted communities, like in The Witcher’s network). Instead, there are two major sides in the network, as the left (with actors like Cristoph Waltz), and right parts of the network (with e.g. Jodie Foster) are denser comparing to the sparser area in the middle with just a few names. Additionally, the largest and brightest (most successful) nodes in Figure 2 are scattered across the graph rather evenly. In general, this observation implies that the prizes are fairly distributed (two rather evenly – very close to each other in the text) within certain cliques of people. This is further supported by the low values of the clustering coefficient (capturing how connected certain nodes’ neighbors are) and the rich-club coefficient (measuring the existence of a central core where top nodes connect to each other). Finally, I noticed an interesting trend in this left-right split of the network when I colored the nodes based on the artists’ debut years to their Oscar nominations, as shown in Figure 3. This visualization clearly reveals this trend as the coloring transitions smoothly from dark (present) to bright (past) times from the left to the right, over consecutive eras of filmmaking. This observation tells us that there are just a few all-time stars of the Oscars, but it’s more typical that new movies, trends, and names emerge year-after-year, as the pool of actors keep expanding, presenting new opportunities (contrasting to a recent example of star DJs, where the phenomenon of all-time stars do exist).
Stars in the network
After considering the overall view of the network, it is worth zooming in a bit to the individual level and seeking out who are the top networkers and whether this networking intensity correlates with the number of prizes they acquired. To capture these in the data, I computed the node Degree (number of connections) and Weighted Degree (total sum of edge weights of a given node) for each character. The top ten characters based on these metrics are shown in Table 1, with Robert De Niro, Diane Keaton, and Burgess Meredith topping the list. While their network metrics seem to show clear patterns – their number of wins and nominations varies greatly, between zero-to-two wins, and one-to-eight nominations.
Do these network metrics signal any relation between Oscar performances? I quantified this with a simple correlation analysis. Based on this analysis, the short answer is: not really, as the correlation values are rather low (Table 2). Similarly, the average values of Weighted Degree barely change between those who had zero, one, or two Oscars (ranging from 8.76-9.44) and only drops for the stars with three prizes (5.75), which is mostly explained by the small size of that group. While in this presentation I only compared two basic network metrics, additional computations showed similar friends when using more detailed network centrality measures, such as betweenness, clustering, or PageRank.
Conclusion and limitations
In this short piece, I first explored the top Oscar winners in numbers and then explored the question of whether these big wins were the results of strong networking and access to Oscar-rich communities. Fortunately, my analysis revealed a lot fewer biases and less support for preconceptions as networking patterns showed surprisingly low correlation with Oscar success.
Of course, this analysis has many limitations. For instance, the picture would be a lot more clear if I not only compared Oscar winners and nominees to capture the differences, but also added those actors and actresses who were not even nominated, to serve as a baseline. Additionally, I defined the network of collaborations based on shared movies – which is not a guarantee of two actors being friends and collaborators, nor does it capture their networking in other ways, so including social media or news articles could enhance this dimension in future studies.
With a background in physics and biophysics, I earned my PhD in network and data science in 2020. I studied and researched at the Eötvös Loránd University and the Central European University in Budapest, at the Barabási Lab in Boston, and the Bell Labs in Cambridge. I am currently the chief data scientist of Datapolis, a research affiliate at the Central European University, a senior data scientist at Maven7, and a data science expert of the European Commission.