Unraveling the Games Among Us

DataRes at UCLA
11 min readMar 23, 2021

By Rachel Li, Anish Dulla, Anika Chakrabarti, Zoey Meng, Ojas Bardiya

Mario Kart, Zelda, Minecraft, Super Smash Bros, Among Us …. Even if you’re not a gamer, you have definitely heard of these games before. Nowadays, video games are playing a prominent role in many of our lives as we are forced to find other means of staying connected and entertained during COVID-19 lockdown. There’s no better way to destress after a long day than by meeting new villagers on Animal Crossing or coming together with friends to dominate in a competitive game in Call of Duty. According to Positive Health Wellness, video games provide a medium for people to build meaningful social connections, develop problem-solving skills, and improve cognitive abilities.

To take a closer look at video games, team GameStop at UCLA Datares investigated global video games sales. The data used in our analysis came from Kaggle. This dataset contains 55.8K records of video game sales scraped from VGChartz. For each video game, the dataset includes fields such as rank, platform, publisher, developer, scores, and sales.

Hey, You Are Popular!

The first issue we investigated is who’s popular, including trends for which genres are in demand, as well as which publishers are active. This gives us valuable insight into how the video game market has evolved over time, and how different regions of the world have brought about both similar and competing trends.

Best-Selling Genres

Video game popularity shifts and changes with new trends over time. While some video games may be popular for a few months, others may remain a classic for decades. But how can we determine which games have been the most successful? One way we decided to investigate this was by plotting the total sales of different video game genres. Our data ranges from the 1970s to present, so the total sales in figures 1 and 2 represent the sum of all video game sales over the past 40 years.

As you can see from the heat map and bar chart, our visualizations showed three clear winners: sports, action, and shooter. These three genres far surpassed the other genres when it comes to the total number of games sold. While these visualizations do not show their sales over time, plotting the total sales provides an interesting insight into which genres have been the most successful.

Figure 1 & 2: Genre Sales by Region

PC vs. Console

Long time professional gamers and beginners alike have a tough decision to make when it comes to choosing between a gaming PC or video game console. While buying a new console may be just as appealing as starting from scratch and building a PC, so is the option of sticking to one’s reliable and nostalgic Nintendo DS. With so many options, as well as the long-standing debate on whether a gaming PC or video game console is the superior platform, we wanted to get a better idea on which platform had the most accessibility to games as well as variety in terms of genres.

In order to do this, we distinguished the top 10 platforms with the most published games from 1970 to 2019. While you’ve probably heard of the powerful XBox Series X or much desired PS5, many of these recent and technologically advanced consoles are not included due to their minimal time in the market and lack of developed games. Furthermore, we broke it down to the twelve most popular game genres ranging from action to strategy. As seen in the stacked bar graph, the overwhelming majority of games being produced are geared towards PCs over any other platform, with the PS2 and the DS coming in second and third, respectively. This wide disparity between the amount of published games for PC and other consoles can possibly be a result of many developers choosing to release games on PC first before making their way to consoles. With the PS4 console coming in at last place for most published games, this can be due to the fact that PS4 games are still being created even after the introduction of the PS5 to the market.

Additionally, we can see the three most popular genres for PC games are adventure, strategy, and miscellaneous games while the three most prominent genres for the other game consoles are action, miscellaneous, and sports games. Knowledge of genres and the amount of games produced for each platform allow us to gain a better sense of the options available and which platform will best suit our interests.

Uncovering Critic Scores

After taking a look at the most popular genres being sold around the world as well as genre variety within platforms, let’s unravel the history of critic scores and ratings. While the casual gamer may gravitate towards a game due to its unique graphics or solely due to its ability to connect with friends, a game critic is forced to examine specific factors, including game aesthetics, marketing expectations, and plot execution. In the above heat map, we charted the average critic score given to games in the 13 most popular genre categories over the years.

With the lightest shade representing the highest score of 10 and the darkest shade representing the lowest score of 5, we are able to see the general trend of critic scores over time. Genres such as shooter, sports, and role-playing can be seen producing the most consistent, positive scores over the course of 50 years. With shooter and sports games generating the 2nd and 3rd most sales in North America, it’s no surprise that they’re relatively well-received. Their consistent scores can be due to the fact that many subsequent shooter and sport games hold many traits from their counterparts. For example, the up and coming first person shooter, Valorant, has been widely regarded as having Counter-Strike: Global Offensive’s shooting mechanics and Overwatch’s ability usage.

On the other hand, we can see games in the action and miscellaneous genre coming in at the bottom of critic ratings. Despite action games being one of the most sold genres in North America and globally, this genre has struggled to create solid footing amongst critics. However, this isn’t to say all games within the action genre are bad. With the “action” title, these genres contain a wide variety of games from the heart-wrenching The Last of Us to the risky Grand Theft Auto V so it may be difficult differentiating the good from the bad. Next time you see a new game on the shelf, you might want to take a closer look into what makes it special before picking it up.

Top Tier Publishers

In addition to looking at the popular genres, we were also interested in exploring popular publishers.

The following pie chart shows the top 10 publishers (in terms of quantity of games) from 1970 to 2020 (only the top 10 publishers are kept for clarity of the visualization). It shows that among the most active individual publishers, there is an approximate similar share of the market. Additionally, a suite of unknown publishers takes up the majority share, at about a third. This represents the popularity of smaller game publishing that has recently taken a trend, as games become easier to make.

This next visualization dives deeper and illustrates the total number of games released by publishers since 1970. Initially, the number of publishes across all companies were low, as access to the technology to create games was limited. Then, as availability for this technology increased, large companies started to create their games, shown in the rise of Sega, Sony, and Nintendo. Finally, as game making tech continued to be accessible to a wider and wider audience, there has been a rise in much smaller studios and publishers, shown by the large spike in unknown creators.

There are a few more interesting trends to note here. First, is the quick rise and fall of Microsoft’s publishing. This may be correlated to the release of a new Xbox, where Microsoft published plenty of smaller companies’ games. Then, they may have made the business decision to stop doing this, as it may not have been profitable.

The other interesting trend is the very spiky behavior of unknown publishers. There are a few factors this may be attributed to, but one theory is that many game studios were going in and out of business, and it was a volatile market. Additionally, since there is continuously new strides in game making tech, this may represent quick adoption of emerging technologies, and then people eventually leaving them.

Can we predict critically-acclaimed video games?

With all our exploration of the video game industry, we all found ourselves wondering one central question: What makes a critically acclaimed video game? Is it the publisher? The genre? Platform? To answer this question, we decided to build multiple Machine Learning Classification models to predict critic scores. Using IGN’s Critic Scale, games scoring above an 8.0 rating are Excellent, above 5.0 are Favorable, above 2.0 are Poor, and those below are coined as Terrible.

To predict these classifications, we used Logistic Regression, Random Forest, SVM, and KNN classification algorithms. We looked at Genre, ESRB_Rating, Platform, Publisher, and Year as features to make this prediction. Least surprisingly, our Random Forest classification algorithm was able to best predict critic scores at 70% accuracy. Let’s take a deeper look at these algorithms and why they predicted at their respective accuracies.

First, we have our least accurate classification model: Logistic Regression. Logistic Regression is the probability of a certain class or event existing, denoted typically as a 1 or 0. This model assumes no multicollinearity nor high leverage values for the independent variables, and the dependent variable must have mutually exclusive categories. Because of the 1/0 nature of this algorithm, this model is best used for binary classification problems, yet Python’s sklearn package does have support for multiclass classification problems with the Logistic Regression model, yet typically they are much less accurate than other classification algorithms, explaining why it had the lowest accuracy with only 61.6% of classifications being correct.

Next, we have our most accurate classification model: Random Forest. The Random Forest classification algorithm has become an industry favorite because of its simplicity and accuracy with big data. This algorithm aggregates the results of multiple decision trees, giving it its name. While single decision trees tend to overfit training data, aggregating the results from multiple trees allow us to combat bias in overfitting while still keeping high levels of accuracy for the test data. Using sklearn’s Random Forest model, 70% of our model’s classifications were correct. The confusion matrix below demonstrates the accuracies and shortcomings of our model. For example, when our model classified a game’s critic score as ‘Excellent,’ it was correct 71% of the time, yet mistakenly classified ‘Favorable’ critic ratings in the ‘Excellent’ class 24% of the time. On the other hand, when our model classified a game’s critic score as ‘Favorable,’ it was correct 69% of the time, yet mistakenly classified ‘Excellent’ critic ratings in the ‘Favorable’ class 31% of the time.

Similarly, we used a Support Vector Machine (SVM) model in order to classify the Critic Scores into the aforementioned categories. Support Vector Machine models aim to separate distinct classes based on the largest distance possible between them. An SVM is a highly flexible but difficult to interpret model. Since we are only concerned with the final prediction in our classification problem, it can be applied in this case. We used a one-versus-one approach since there are four classes we had previously divided the data into. This algorithm correctly classified critic scores at 65.6%, making it the most accurate after the Random Forest model.

Lastly, the final model we used was a K-Nearest Neighbors (KNN) classifier. Roughly speaking, this approach determines what class a particular observation belongs to under the assumption that similar observations exist in close proximity. Ultimately, this model uses the idea that similar things are near each other. K refers to the number of nearest neighbors, and by changing the value of k, one can tune the KNN Classification model to produce the most accurate results. This algorithm correctly classified critic scores at 63.6% at K = 11, making it on the lower end of accurate models.

The accuracy for each model we used lies between 60–70%. There are a couple of possible reasons as to why this may be the case –

  • Large amount of Missing Data: Of the 55792 observations in the dataset, 49256 do not have any Critic Score included. After cleaning the remaining data, we are left with 1663 observations in total to apply our classification models.
  • High Variance: Since classification methods like Random Forests and SVMs are more flexible they generally tend to eliminate bias at the cost of higher variance. This may lead to an increase in the MSE (Mean Squared Error) for the test data.
  • Large amount of Noise in the Data: Lots of meaningless data cannot be interpreted correctly by our classification models. This is an issue with how the data was collected.

Results

Our dataset provided a lot of interesting insights into what makes a video game both popular among gamers and successful in the market. By plotting video games sales, top gaming platforms, critic scores, and top publishers, we found some common themes in our data. One of the trends that stood out was the rise in video game publishing across all genres starting in the early 2000s. This may be due to the fact that technology at the time was quickly advancing, and access to video games suddenly became much easier. When looking at the highest-selling video game genres and their critic scores, another theme that stood out was the popularity of sports and shooter games. Both genres were ranked the highest in their total sales, and they both consistently received high critic scores. Our data also showed that video game publishers are most likely to release games geared towards PCs, making PCs the most popular gaming platform.

After exploring these different observations in our dataset, we then used machine learning models in order to delve deeper into critic scores. We wanted to learn more about the traits of a critically acclaimed video game, and why certain games are more likely to receive a higher critic score. These models resulted in a 60–70% accuracy, with the Random Forest model being the most accurate.

Conclusion

So what does the future of the gaming industry hold for us? As we welcome the new age of gaming with the rise in esports, cutting edge virtual reality, and Twitch streamers growing their platforms everyday, it’s important for casual and competitive gamers alike to know which developing platforms and game genres are most worth your time. While the debate between the gaming PC and video game console continue, can we expect for mobile gaming to pave a path as the leading platform in the future? Besides the top three most popular game genres, what other genres can we expect to come out with more sales and better critic scores in the following years? With so many possibilities in such a rapidly evolving industry, it’s beneficial for users to keep up with the ever expanding variety of trends. But until then, sit back and relax with some Stardew Valley.

If you’re interested in seeing the code for our visualization and machine learning model, you can visit our Github here!

--

--