W

What ChatGPT (and Humans) Say About Data Science Trends

“What will be the biggest data science trends in 2023?”

First of all, why would I ask artificial intelligence tool ChatGPT, a question about data science? Well, there are several personal reasons. Since my undergraduate years of studying physics, I have been deeply fond of Isaac Asimov and his Foundation series. Later, as I did my PhD in data science, I realized how close data science and Asimov’s psychohistory actually are—using quantitative tools to understand and forecast human behavior at scale!

This fascination led to a language analytics language processing research project where I dealt with Asimov’s books as a stream of textual data in 2021. As Asimov also contributed significantly to the principles of robotics, it was an appropriate gesture to incorporate this fancy new tool called “GPT-3” by OpenAI—a text completion engine. GPT-3 was already trained on a vast amount of text produced by humans and performed reasonably well at completing entire paragraphs based on as little as a single sentence. So I added a few paragraphs of GPT-3’s text to the research paper. Those paragraphs worked fairly well in the text, but honestly, did not add too much novelty.

A few months after, as the new year of 2022 kicked in, I went to the OpenAI platform again and asked GPT-3 about the major trends of data science for the new year. On the one hand, these prediction sentences worked quite well. On the other hand, they were very vague and generic, and closer to horoscopes than actual forecasts. For example, the points were outlined in brief sentences, not elaborated paragraphs, and they were generic enough that some points were even slightly inconsistent with each other.

So after all this, I was very curious to see the new fuss around the latest tool of OpenAI, ChatGPT. At first glance, this chatbot really knows something! Passing medical and law school final exams, generating pretty color palettes, and so many more applications. In early 2023, I followed my tradition and asked the AI bot the big question again:

What will be the biggest data science trends in 2023?

1. Automation and AI-driven analytics: Automation and AI-driven analytics will continue to play a major role in data science trends in 2023. This will allow organizations to process and analyze large amounts of data quickly and accurately, enabling them to make better decisions and increase their efficiency.   

2. Augmented analytics: Augmented analytics is the combination of machine learning and natural language processing to help data scientists visualize data quickly and effectively. This technology will allow data scientists to quickly identify and explore patterns and trends in data, enabling them to make faster and more informed decisions.   

3. Explainable AI: Explainable AI is the ability to explain the results of a data science model to non-technical audiences. This technology will help bridge the gap between data scientists and non-technical decision makers, allowing them to understand the results of data science models and make better decisions.   

4. Edge computing: Edge computing will become increasingly important in 2023, as more and more devices will be connected to the internet and data will need to be processed in real-time. This will allow data scientists to quickly process and analyze data from devices at the edge of the network.   

5. Data privacy and security: As more and more data is collected, stored and analyzed, the importance of data privacy and security will become increasingly important. Data scientists will need to ensure that data is secure and that privacy laws are adhered to.

While at this point it’s pretty hard to judge how accurate ChatGPT’s data science predictions are, they seem to cover a lot more technical terms and are a lot more elaborate than the previous year’s forecasts from GPT-3. This makes a large technological upgrade pretty clear.

That said, comparing GPT-3 and ChatGPT isn’t apples to apples. ChatGPT comes from the same line as GPT-3; they both belong to the so-called Generative Pre-trained Transformer language model, a deep learning framework designed to produce human-like texts. While ChatGPT builds on GPT-3.5, GPT is a general-purpose system that excels in various functions (from text generation to machine translation), while ChatGPT was explicitly designed to be a Chatbot, that is, chatty and able to hold up conversations and answer in a way that feels more natural for us.

The chatbot only has a few points regarding the future of data science, human knowledge is out there on numerous channels even today, like on Twitter. So in addition to asking ChatGPT about what’s on the horizon for data science, I sampled what humans are saying about that topic, by analyzing thousands of tweets and hastags related to data science. Then, I visualized them on a network map, shown below.

An image of colorful network nodes on a black background, illustrated based on 10,000 tweets containing the hashtag #datascience over the last two weeks of 2022.
The hashtag network I designed by collecting and processing approximately 10,000 tweets containing the hashtag #datascience during the last two weeks of 2022. After downloading the tweets with TweePy, I extracted the hashtags from each tweet. Then I built the hashtag network where each node represents a hashtag, and two nodes are connected if they were co-tweeted. To keep the backbone of the network with the most important nodes and links, I also applied a final edge filtering step. Additionally, I coloured the nodes based on network communities, also known as strongly interconnected subgraphs.

This data science snapshot shows that big data analytics still rules the world, with AI and ML in the center. (The figure also tells us that the data collection overlapped with 2022’s #100daysofcode.) What’s very interesting to see is that ChatGPT proposed pretty important matters, such as explainable AI and data privacy, which are nowhere among the major topics (network nodes), except maybe cybersecurity.

ChatGPT might hint at something on the network map as it forsees the rise of augmented analytics, but it didn’t mention blockchain, which was a moderately buzzy hashtag on Twitter.

Of course, the real question is still whether ChatGPT is producing smart combinations of existing pieces of information or the machine has inferred something we humans haven’t even thought of. For that, we’ll just have to wait and see.

With a background in physics and biophysics, I earned my PhD in network and data science in 2020. I studied and researched at the Eötvös Loránd University and the Central European University in Budapest, at the Barabási Lab in Boston, and the Bell Labs in Cambridge. I am currently the chief data scientist of Datapolis, a research affiliate at the Central European University, a senior data scientist at Maven7, and a data science expert of the European Commission.

CategoriesData Science