Repurposing Our Liked Songs

Amateurs attempt at rediscovering musical gems on Spotify

9 min readApr 30, 2021

When I think about my life, I picture it like a movie. There might not be lights and a camera crew, nor is there a cast of famous Hollywood actors, but there’s scenes and there’s music, which, in my mind, are the most important parts. Music defines different phases, moods, and memories of my life. Fall Out Boy defines middle school bus rides home. Still Woozy brings me back to freshman fall nights with friends. And upbeat French Pop energizes me and starts my day.

Since downloading Spotify seven years ago and curating this soundtrack to my life, I have became a playlist connoisseur — organizing my music into the aforementioned phases, moods, and memories. However, not every song fits into my designed categories. They are then placed in the unknown, my Liked Songs.

My Liked Songs has become uncharted territory, an amalgamation of any songs I’d heard before and hadn’t immediately hated. I’d hear a song one day, add it to my Liked Songs, but then a week later, I’d completely forget it existed. The song was left in the abyss, never to be seen again.

With the open-ended final project for our Big Data Analytics class, my partners Jonathan Chen, Hannah Mickelson, and I were all eager to put our data science skills to the test. Throughout the semester, we had been challenged to think about applications for data wrangling, machine learning, and data visualization. When prompted with an opportunity to find a real-world connection, Spotify came to mind. However, without the knowledge of Spotify Data Engineers, we needed to use some simplification and intuition to fill in the gaps.

We set a goal: we will create a clustering algorithm from our Liked Songs and organize them into playlists. Rediscover our own music. The crux of our organization algorithm would be song similarity, a ‘nearest neighbors’ approach. Similarity would be based on:

Musical Features: key, mode, rhythm, tempo.
Song Features: genre, title name, album name.
Spotify Features: acousticness, danceability, instrumentalness, energy, liveness, loudness, speechiness.

To accomplish our goal, we created the following plan:

Set Up: Let’s understand playlist structure.
Create Model: How do we reorganize the music?
Analysis: How are our playlists?
Spotify vs. Us: How do we compare?

Step 1: Set Up

To get an understanding of playlist structure, we used the data from Spotify’s Million Playlist Dataset Challenge. The dataset contains one million randomly selected playlists in the following structure:

Source: Million Playlist Dataset Challenge

We want to set the foundation for our organization system: we will extract the trends across existing playlists and apply them to our automated playlist creation. We randomly selected 25,000 playlists from the million playlist set and found the following:

Compiled averages of randomly selected 25,000 playlists.

Potential correlation between song length and playlist duration.

We tested if there was any relationship between song length and playlist duration. It seems that there is some linear relationship, albeit a weak one, between the two.

To extract the musical and spotify features, we connected to the Spotify API. Now, focusing on another random subset of 10,000 playlists, we analyzed acousticness, danceability, duration_ms, energy, instrumentalness, key, liveness, loudness, speechiness, tempo, valence. We found the following trends:

Shows correlation between all Spotify features.

We found that loudness and energy, valence and danceability, and valence and energy have the strongest positive correlations. Acousticness has the strongest negative correlation with loudness, energy, and valence.

We also searched to see how playlists are typically named. Are they usually neutral, positive, or negative? We used AFINN for this sentiment analysis and found that playlist names are overwhelmingly neutral.

We also found that people, by a 3:2 ratio, prefer major to minor keys. In particular, the most popular keys are C major and G major. But D major, F major, and A major are close seconds.

Key: 0 = C major, 1 = C# major, 2 = D major, 3 = D# major, … 11 = B major.

Step 2: Create the Model

With these features in mind, we can now create our models. First, we aimed to create a larger sample — random selection of 10,000 songs. Let’s call this music set Sample. Then, we ran the same set of steps on our individual Liked Songs to reorganize our music. They were as follows…

Connect to Spotify Developer. This allows us to access our Spotify accounts from our Google Colab notebook and query for playlists.
Read in our song sample. For our first reorganization, we gathered 10,000 songs from massive music. For our later runs, we used our own Liked Songs. We used the following to generate the first playlist:
Extract audio, musical, Spotify features, namely acousticness, danceability, duration_ms, energy, instrumentalness, key, liveness, loudness, speechiness, tempo, valence. Get track from Spotify API to get these features.

4. Rescale the values to the [0, 1] range via MixMaxScaler.

5. Find best number of playlists by the elbow method.

Elbow Method for *Sample. Concluded 40 playlists would be optimal.*

6. K-Means clustering to make our playlists. Our sample ran on 40, Jonathan and I had 20, Hannah had 10.

7. Create empty playlists. Run a loop for the number of desired playlists. At each step, create a new cluster_i playlist.

8. Get all the new playlist IDs. Fill each playlist with one of the k clusters.

9. Enjoy! :)

Step 3: Analyzing our Playlists.

Let’s look at Sample’s playlists. We charted all 40 clusters against five features: acousticness, liveliness, danceability, energy, and valence.

Violin plot of features across all 40 playlists.

For a less overwhelming model, let’s focus on 7 playlists.

Recall, we previously found positive correlations between loudness, energy, valence and danceability. All four of these features are negative correlated to acousticness. If we look at Cluster 0, 3, and 6, we can see that it is high in danceability, valence, and energy but very low in acoustiness. Cluster 1, on the other hand, is high in acousticness and low in the other three features. These playlists fortify our earlier observations.

Next, we charted the tempo, speechiness, loudness, and instrumentalness of each of the playlists. Since we trained the model to group based on similarities of these features, we hoped to see variation across different playlists.

Even within these four clusters, a more digestible subset, we can see variation between the playlists. Quantitatively, we seem to have successfully organized them into similar groupings.

Qualitatively, we listened to our new, personalized cluster playlists. As a group, we rated our playlists a 7/10. A few of our playlists were successes. The majority of the songs made sense when they were grouped together. Some of us even saved those playlists and continued to listen to them. We got to rediscover great songs, as initially intended.

However, for each of us, there were at least one or two playlists that seemed like ‘overflow’ — they were just a hodgepodge of songs that had little in common. Also, every playlist had a handful of outliers. This could have been attributed to a range of factors: we had not taken account of language, rhythm variation in a song, common pairings of songs, etc.

Step 4: So… are we getting hired?

Spoiler Alert: probably not. As expected, Spotify’s organization and recommendation system are much more complex and effective than ours.

In fact, our initial goal with this assignment was to recreate Discover Weekly — Spotify’s weekly curated, customized playlist with new music for its users. We initially tried to recreate and simplify this recommendation algorithm — but we struggled in accuracy.

Our initial approach relied on ‘nearest neighbors’ approach, like our organizing algorithm. We aimed to use the million playlist dataset to find alike playlists and user music preferences. We set out to bucket playlists together that were ‘very similar’ — self-defined as containing at least 30% of the same artists. We would compile the songs in the bucket, scored by the number of times they occur in the bucket, and recommend the highest-ranked songs to a playlist it was not present in.

However, we found that Spotify uses similar technique to create song recommendations. So, in addition to creating a playlist organization algorithm, we decided to learn a bit more about how Spotify does what it does best: Discovery Weekly.

Recall that, Spotify has access to all user interactions with its platform. Alongside the given dataset, Spotify stores everything in matrices. For example, one key boolean matrix is the User x Songs matrix (pictured below), where each row represents one of the 345 million users and each column represents one of the 70 million songs. Each index [row][col] in the matrix represents where some user ‘row’ has listened to some song ‘col’.

Source: How Does Spotify Know You So Well?

With all user behavior, Spotify’s recommendation system can be much more fine-tuned than ours. In fact, their system is trifold.

Source: How Spotify Recommends Your New Favorite Artist

Collaborative Filtering: “Uses your behavior and that of similar users.”

This is the most similar to our intuitive playlist recommendation system approach. Just like our planned model, Spotify uses ‘nearest neighbors’ to predict a users behavior and preferred music. Spotify uses this in tandem with user interaction with the app — what is a user listening to? when are they listening to it? how often do they listen to it?

2. Natural Language Processing (NLP): “Song lyrics, playlists, blog posts, social media comments.”

At a high level, Natural Language Processing (NLP) is based on sentiment analysis, can we categorize words and phrases as positive, negative or neutral? Sentiment analysis scores generally range from -1 to 1, from negative to positive respectively. We previously ran AFINN’s sentiment analysis on the 25,000 playlists to find that most playlists are neutral. However, we only provided three categorizations (neutral, positive, negative). Spotify’s categorization is more nuanced by phrases and playlist descriptions.

Spotify crawls playlist names, blog posts, online posts, reviews, and any text about music to access artists, songs and albums. Once the music is scored, Spotify can organize ‘terms and weights to create a vector representation of the song that can be used to determine if two pieces of music are similar’.

3. Audio Models: Based on raw audio.

This third model is the Discover in Discovery Weekly. Raw audio models weigh all songs the same, a song with 1,000,000,000 streams and a song with 50 streams are treated equally. The audio model is essential for introducing new artists, because the previous two models — collaborative filtering and NLP — favor popular songs and artists.

Spotify uses Convolutional Neural Networks (CNN) to analyze raw audio. This network consists of layers: four convolutional layers and three dense layers. We input time-frequency representations of audio frames (example below). After processing the input through the multiple layers, the CNN outputs ‘an understanding of the song, including characteristics like estimated time signature, key, mode, tempo, and loudness.’

Figure 10.1: Source: Understanding Audio Data…

Spotify can merge these three models to create nuanced playlists for your taste with more accuracy than solely using genre analysis and API features.

My Liked Songs are no longer an unknown. We shined light into this abyss and rediscovered our own music. I am eager to see how these songs will weave its way into our movie lives — for Jonathan, Hannah, me, and maybe you too :).

Link to Colab project. Link to the GitHub repo.