Spotify Trends Analysis

9 min readDec 22, 2020

By Deepthi Gangiredla, Deana Moghaddas, Ovie Soman, Trina Nguyen, Zoeb Jamal

In today’s world, music helps fuel our daily lives and stimulate our mind. There are certain artists and songs that we know we cannot get through the day without listening to, but have you ever wondered what stylistic components of these songs make them so popular? As defined by Spotify, each song has a varying amount of acousticness, danceability, energy, duration, instrumentalness, valence, popularity, tempo, liveness, loudness, and speechiness. As a group, we decided to explore 160K Spotify tracks dataset available on Kaggle to see if we could observe any correlation between these individual factors and the popularity of a song. Below we included statistical visualizations and plots to provide insight into what makes a hit song!

How Has Duration Changed over Time?

Are longer songs likely to have a lower fan-following? Are they too tiresome for our tastes, or do we love them just the same as their shorter counterparts? Our team wanted to explore how the duration of a song affected its popularity, and also how this relationship changed through the decades. Have musical tastes with respect to song duration changed at all in the past 90 years? The plot above displays our findings. The range of years considered is ninety years wide because we were interested in how the relationship between song length and popularity developed over the decades (how different are our listening habits to those of, say, the generation of our great-grandparents?) The decades were chosen with an average gap of 20 years to allow a quick sweeping look over trends.

It does seem that for all the years considered (1930, 1950, 1970, 1990, 2010 and 2020), the popularity peaks around 3 minutes 20 seconds to 5 minutes of runtime. The popularity values taper off from this peak in either direction and seem to be spread mostly between 0 minutes to 8 minutes 20 seconds of runtime. This makes sense, as most song durations fit into this interval. There are a few outliers, however, most conspicuously in 2010, when a couple of songs with more than about 66 minutes of runtime had a popularity rating around 50. Another thing to keep in mind is that the popularity ratings were in general lower in 1930 and 1950 than in the following years. This could be attributed to the lower accessibility of songs during those years due to limitations in the availability of musical equipment. The ratings are highest in 2020, which makes sense regarding the plethora of music-listening options available to us today (including Spotify itself).

Thus, we come to the conclusion that there is a runtime for every year at which popularity peaks, and it tapers off in either direction from that peak. Based on the above visualisation, creators would need to target a runtime of 3 minutes 20 seconds to 5 minutes for maximum song popularity.

So how did the average song duration change over the decades? The above visualisation details the same, with durations truncated to the nearest second. The change in unit from minutes to seconds was done to better showcase the differences in average song length over the decades. There was a steady increase in mean song duration from 1930 to 1990 (195 seconds to 259 seconds), followed by a steady decrease to 197 seconds in 2020. 1990 had the longest songs on average, while 2020 and 1930 are both at the other end of the spectrum with surprisingly similar average song durations. This is an interesting result — are we all so swept up in the twenty-first century tide that we have no time for longer songs? It is reasonable to guess that, at least in 1930, songs were shorter on average due to technological limitations.

Usage of Characteristics in Music Over Time

Spotify not only contains data on recent music since the birth of their streaming platform, but it holds data characterizing music since the early 1900s as well. Speechiness is the level of spoken word within a song, for example rap tending to be very “speechy” while a Beethoven Sonata tending to be much less speechy. The graph below depicts the fluctuations of speechiness over time with many changes being attributed to developments of new technologies, instruments or trends. Values over 0.66 indicate tracks containing a great majority of spoken words, values between 0.33 and 0.66 indicate tracks containing both speech and music, like rap music, and values below 0.33 mostly consist of pure non-speech music.

Highlighted in yellow on the above graph, the boom of music production signaling the transition from live-music recording to multitrack recording began to gain momentum in the 1960s allowing music to be mixed and pieces to be added after the initial recording. This invention has eased pressure off live instrumentals and spoken word, with lots of newer music and genres almost solely relying on a producer’s skill and creativity. Hence, with rising popularity of production has come declining usage of speechiness within music compared to its height in the 1930s to 1950s.

Genres Analysis

From our analysis, we noticed that for quite a few stylistic components there is little correlation between its relative usage and the popularity of a song. This is because music is incredibly diverse and although there are select characteristics that appear frequently in popular songs, the usage of stylistic components of music overall fluctuate continuously as popular music consists of different genres and different characteristics due to humanity’s averse reaction to repetition. Hence, clear trends are difficult to identify with many musical components; however, these trends become more prominent when analyzed within the context of genre.

Five genres including Alternative Rock, EDM, Indie Pop, R&B and Rap were graphed against the averages of characteristics including acousticness, danceability, valence, energy, speechiness, instrumentalness, duration and popularity within each genre.

Some notable trends within genres on average are that EDM is the most popular genre out of the five, rap is by far the most speechy while indie pop is the least speechy and that EDM has the least valence, the level of “joyful” sound detected in the music. Additional, Rap and R&B have the highest levels of danceability and R&B has the highest duration.

Artists and production teams can use these trends and information to incorporate stylistic components that appear to be more popular, especially within the genre they aim to categorize themselves in. For example, the EDM community appears to enjoy more melancholy music in general and artists can use this information to cater to what their audience will like.

Artist Trends

Every year, we see new collaborations between artists who come together to release some top chart tracks. We wanted to see if there were any similarities between the stylistic features of these collaborative tracks and how they correspond with each other. Using the popularity feature in the dataset we extracted the most popular songs with two or more artists in 2020. The results indicated a range of collab genres with pop, R&B, hip hop, and Latin pop being the most repetitive.

Top 16 Collaborative Song Features

Using the top collaborations of 2020, we ran PCA on all the features of each of the songs (Acousticness, danceability, energy, duration, instrumentals, valence, popularity, tempo, and speechines) and noticed that the features of the songs did group together. In fact, some features correlated with each other more than others. The PCA results indicate that valence, energy, duration, and danceability were all similar factors among popular collaboration songs. Further analysis into each of these factors showed that these songs had a higher range of danceability (mean: 0.737), average song length (3 minutes), higher range of energy (mean: 0.72), and a moderate valence range (mean: 0.546). In addition, the tempo of these songs ranged around a mean of 118.39 which was also the higher side of the range. It can be deduced from these results that popular collaborations are usually more energetic and upbeat songs.

Given the number of columns present in the dataset and how many factors can influence a track, we decided that performing a principal component analysis, or PCA, might be beneficial and make the data easier to visualize. Rather than create many scatterplots showing us the correlation between one quality and popularity or making a graph with more axes than humans can comprehend, PCA is a method of dimension reduction through which a large matrix can be reduced to 2 axes, or principal components. By doing this, we can see how songs with similar characteristics cluster together. By adding a size parameter or by cross referencing with popularity line graphs, we can easily see how popularity is influenced. Initially, we naively performed PCA on the whole dataset, expecting to see clear clusters that would overlap nicely with genres. However, as mentioned before, music is very subjective and the PCA plot produced was incomprehensible. We decided to perform PCA on the top 10 artists, and see if we could identify the different clusters.

While there is a lot of overlap, we see clusters for certain artists very clearly. For example, Frank Sinatra’s music (orange) is very different from Eminem’s (yellow), but quite similar to Elvis Pressley’s (green).

We also wanted to see if there was any way we could showcase the year and popularity of tracks in my principal component analysis. We decided to focus more specifically on The Beatles, since they were the most popular and had a large track count. First, we used line graphs to observe the changes in acousticness, instrumentalness, and speechiness have changed in their music as time has progressed. The figures show the mean value as well as the deviation. From this, we can see that there have been some spikes and dramatic changes in the early years, as well as one spike later on for acousticness.

We also decided to look at the median values as well, to see if there were any differences from the other graphs we had created. They showed similar results, with some variation in the early years for acousticness and energy, as well as later changes for instrumentalness. The PCA plot also supports this conclusion. We can clearly see that their music has changed over time, with post 2000s music all clustering together. We also tried to add size as a parameter, so we could see the popularity of the track as well as where it clusters. Most of the popular tracks are clustered in the big group, so we can see that The Beatles definitely found a sort of ‘formula’ for making a song that everyone enjoys!

We also tried to perform PCA on some of my own playlists and found that there was a surprising amount of overlap between a rap playlist and a lofi playlist.

However, upon remembering that most lofi beats are just edited hip hop beats, we realized the reason for this overlap. This sort of analysis has many applications and could be used as a way to build a recommendation engine for new music. You could run PCA on a bunch of tracks to see how much it overlaps with your existing library and then have those recommended to you; much like Spotify’s own discover weekly playlist.

Conclusion

Overall, our analysis reveals that music is indeed incredibly diverse with countless stylistic components of different ranges with usage constantly changing over time. We found it really interesting that an art form like music can be so defined by the metadata associated with it. One surprising thing we found (even though it should be obvious) is that there is a significant amount of overlap between lofi music and hiphop tracks, most probably because lofi beats are actually sampled from these tracks. What’s popular a decade from now may be currently unheard of and our palette of popular stylistic components may be completely different. This is what makes music so universal and enticing to everyone that comes across it; the abundance of variety and creativity never ends.

Check out our GitHub for more detailed analysis!