Machine Learning

05 January 2024

Personalized Music Recommendation System

In this post, I will be documenting my journey in creating a recommendation system from scratch using my music that I have collected from Spotify. This post will be divided into two parts:

  1. Data Collection
  2. Model Training
  3. Evaluation
  4. Conclusion

Before we go further, I think it's important to know what is NMF, how NMF works, and why I chose this algorithm to create a recommendation system.

What is Non-negative Matrix Factorization (NMF)?

Non-negative Matrix Factorization (NMF) is a matrix factorization method that factorizes a non-negative matrix into two non-negative matrices. NMF equation can be expressed as follows:

Vn×mWn×p×Hp×m\underset{n \times m}{V} \approx \underset{n \times p}{W} \times \underset{p \times m}{H}


  1. VV is a matrix containing the atrributes of the songs we collected from Spotify API.
  2. WW is a matrix containing the latent features of the songs.
  3. HH is a matrix containing the coefficients of the latent features.
  4. nn is the number of songs.
  5. mm is the number of features.
  6. pp is the number of latent features.

The goal of NMF is to find the latent features and the coefficients of the latent features that can reconstruct the original matrix. Note that the latent topics in pp are not directly observable in the data but are inferred from the relationships between songs and their attributes.

How do we determine if a song is similar to other songs? First, we are going to feed matrix Vn×mV_{n \times m} into the NMF algorithm from scikit-learn with pp latent features that produces Wn×pW_{n \times p}. Assuming that we have a song with pp features, let's call it S=[s1,...,sp]S = [s_1, ..., s_p]. Once we have matrix WW, we can multiplicate it with STS^T. Thus, W×STW \times S^T will give us a vector RR containing values representing how similar the song is to other songs, ranging from 0 to 1. 00 represents that the song is not similar to a particular song. Conversely, 11 represents that the song is very similar to a particular song.

Wn×p×STp×1=Rn×1[w11w1pwn1wnp]n×p×[s1sp]p×1=[r1rn]n×1\begin{aligned} \underset{n \times p}{W} \times \underset{p \times 1}{S^T} &= \underset{n \times 1}{R} \\ \begin{bmatrix}w_{11} & \cdots & w_{1p} \\ \vdots & \ddots & \vdots \\ w_{n1} & \cdots & w_{np}\end{bmatrix}_{n \times p} \times \begin{bmatrix}s_1 \\ \vdots \\ s_p\end{bmatrix}_{p \times 1} &= \begin{bmatrix}r_1 \\ \vdots \\ r_n\end{bmatrix}_{n \times 1} \end{aligned}

Now we know what NMF is and how it works in mathematical perspective. Here are the reasons why I chose NMF as the algorithm behind my music recommendation system:

  1. NMF can be scaled to large datasets, which is essential for music services that have vast amounts of tracks and user data.
  2. NMF models can be updated incrementally with new data (new users or new songs), which is crucial for dynamic systems like music recommendation services where the database is constantly growing
  3. Just like other matrix factorization techniques, NMF reduces the dimensionality of the feature space. This is beneficial because it helps in handling the curse of dimensionality and noise in the data. In the reduced space, similar songs are closer to each other, making it easier to find and recommend similar items.

Data Collection

My Spotify homepageMy Spotify homepage

Since I am a Spotify subscriber, I use their public APIs to collect approximately 2000 songs both from my own playlists and other playlists that I have never listened to. If you're interested on how I collected the data, you check the script here on GitHub.

There are endpoints that I used to collect the metadata of songs in my own playlist and also songs in the playlists that I never listened to:

  1. to get the access token
  2. to get the refresh token
  3. to get all of my playlists
  4.{category_id}/playlists to get all of the playlists in a category
  5.{playlist['id']}/tracks to get all of the tracks in a playlist
  6.{track_ids} to get the tracks' metadata in a bulk
  7.{track_ids} to get the tracks' audio features in a bulk

While gathering the data, make sure that you send a request and get the response in a bulk. This way, you can reduce the number of requests that you send to the API. Spotify API has a rate limit of 10 requests per second, so you have to be careful when sending requests to the API.

Here is one of the songs's metadata that the API returns:

  "id": "2i2gDpKKWjvnRTOZRhaPh2",
  "title": "Moonlight",
  "artist(s)": "Kali Uchis",
  "popularity": 88,
  "danceability": 0.639,
  "energy": 0.723,
  "key": 7,
  "loudness": -6.462,
  "mode": 0,
  "speechiness": 0.0532,
  "acousticness": 0.511,
  "instrumentalness": 0.0,
  "liveness": 0.167,
  "valence": 0.878,
  "tempo": 136.872,
  "type": "audio_features",
  "uri": "spotify:track:2i2gDpKKWjvnRTOZRhaPh2",
  "track_href": "",
  "analysis_url": "",
  "duration_ms": 187558,
  "time_signature": 4

To understand the meaning of each feature, you can read the documentation here. While writing this post halfway, I stumbled upon a library called Spotipy that can be used to interact with Spotify API. I could have used this library to collect the data, but I decided to stick with my own script.

Data Exploration

My playlist heatmapMy playlist heatmap

Looking at the heatmap above, we can see danceability, energy, and loudness are highly correlated. Let's see how the distribution of each feature looks like.

Distribution of each featureDistribution of each feature

From the distribution above, we can see that most of the songs in my playlists are popular, danceable, and loud. I assume that's the reason why the heatmap above shows that those three features are highly correlated.

Scatter plots of danceability against other featuresScatter plots of danceability against other features

Things we can notice from the scatter plots above:

  1. Most danceable songs are popular.
  2. Danceability is positively correlated with energy and loudness.
  3. Danceable songs are low in speechiness.
  4. Danceability is negative correlated with acousticness and instrumentalness.
  5. Danceable songs have quite high tempo.

Scatter plots of popularity against other featuresScatter plots of popularity against other features

The observations from the scatter plots above:

  1. Popular songs are danceable
  2. Popular songs are energetic and loud.
  3. Popular songs are low in acousticness, instrumentalness, and speechiness.
  4. Popular songs are high in valence, meaning that they are happy and cheerful.
  5. Popular songs have quite high tempo.

Now let's see how the data values look like real close. Here is the first 5 rows of the dataset that I converted to a Pandas dataframe:


Let's see how much each feature contributes to the cumulative variance of the dataset. First we create two lists, one to store the reconstructed errors and the other to store the number of dimensions, starting from 2 to the number of columns in the dataset.

reconstruction_errors = []
dimensions = range(2, df.columns.size + 1)

After that, we create a pipeline that will scale the data, perform NMF, and normalize the data. Then we fit the pipeline to the dataset and append the reconstruction error to the list so that we can plot it.

# create an elbow plot to determine the optimal number of dimensions
for dimension in dimensions:
  pipeline = make_pipeline(

Reconstruction error graph using the Elbow methodReconstruction error graph using the Elbow method

The general rule of thumb is to choose the number of components where the elbow starts to flatten out. However, the reconstruction error above has a step decline and only flattens out at 14 dimensions. Now, two questions emerged:

  1. Should I use all of the features to create a better recommendation system? That's where the reconstruction error flattens out.
  2. Should I just use half the number of the features to create a so-so recommendation system?

Looking at the graph, I was tempted to use all of the features to create the "perfect" recommendation system. However, there is a catch. If I use all of the features, meaning that the recommendation system will be so good that it will recommend me songs that I have already listened to.

Therefore, if I want to create a recommendation system that will recommend me songs that I have never listened to, I have to use half the number of the features to allow some mistakes to happen. Those mistakes will expose me to new songs or new genres that I have never listened to.

Model Training

This time, let's use 7 dimensions to create a recommendation system, and get the names of those 7 features.

pipeline = make_pipeline(
transformed_songs =
components = pipeline.named_steps['nmf'].components_
categories = pd.DataFrame(components, columns=df.columns.values)

There are two variables that we need to pay attention to, which are transformed_songs and categories. First, let's look how categories looks like:


What I am gonna do is, for each row, I am gonna get the name of the feature that has the highest value. For example, for the first row, the feature that has the highest value is popularity. Thus, the first category is popularity. The same applies to the other rows.

The discovered categories are:

  1. Popularity
  2. Mode
  3. Instrumentalness
  4. Key
  5. Speechiness
  6. Valence
  7. Acousticness


If we print out the first element of transformed_songs, we will get something like this:

array([0.68781855, 0.        , 0.02267717, 0.42057155, 0.03739232,
       0.46179948, 0.36722474])

Let's present those values in a dataframe:

column_names = ["popularity", "mode", "instrumentalness", "key", "speechiness", "valence", "acousticness"]
transformed_songs_df = pd.DataFrame(

transformed_songs_df would look nicer like this:

Die For You0.7956590.4198120.0845640.3670310.0349010.0630330.208879
See You Later0.0000000.5025420.0000000.6338090.0000000.0615990.584759
Glimpse of Us0.5445960.4825100.0300810.4287210.0711490.0015090.529932

We can make it even nicer by categorizing each song based on the most dominant feature. First we gotta get the index of the dominant category.

dominant_categories = [np.argmax(processed_songs.iloc[i]) for i in range(len(processed_songs))]

Then we create a new dataframe using the dominant_categories and get the name of the category using the index.

categorized_songs = pd.DataFrame(
  [categories.iloc[i].sort_values(ascending=False).index.values[0] for i in dominant_categories],

Here is the first 20 rows of the categorized songs:

Die For Youpopularity
See You Laterkey
Glimpse of Uspopularity
La La Lost You - Acoustic Versionacousticness
Good Newspopularity
Jocelyn Florespopularity
Come Back to Earthacousticness
Habits (Stay High)popularity
Best Part (feat. H.E.R.)popularity
Here With Mepopularity
Until I Found You - Piano Versionacousticness
i love youacousticness
Idea 10acousticness
Idea 1instrumentalness
Idea 22instrumentalness

Just by looking at the first 20 rows, I can say that the model does a pretty good job at categorizing the songs. Counting the number of songs in each category, I noticed most songs I like and listen to are popular songs.

popularity          113
mode                 33
instrumentalness     23
key                  17
acousticness         14
valence               7
speechiness           1

Take a music called "Idea 1" for example. It is an piano instrumental music that I often listen to. Let's see what the model would recommend me based on that song.

wanted_song = processed_songs.loc['Idea 1']
recommended_songs =

Looking at the recommended songs below, I think the model did an impressive job since most of the songs are instrumental songs that in the same genre as "Idea 1".

Cornfield Chase0.988377
Cornfield Chase - Piano-Cello Version0.985630
Cornfield Chase (Slowed + Reverb)0.985341
Idea 22 (Slowed + Reverb)0.982635
Interstellar- Main Theme0.979361
Day One (Interstellar Theme)0.961549
dream river.0.960631
Idea 22 (Sped Up)0.956474

To test the model, let's use the music that I have never listened to before.

As It WasHarry Stylespop
See The Light (feat. Fridayy)Swedish House Mafiapop
Over YouDillistonedance/electronic
Hiraeth (feat. Kim Van Loo)Fejkádance/electronic
IndulgenceNora En Puredance/electronic
All Night LongMarshdance/electronic
Just OverYottodance/electronic
SaltaSultan + Sheparddance/electronic
Believer - Marsh's Guatape RemixAbove & Beyond, Marshdance/electronic
BreathingBen Böhmerdance/electronic
Tell Me Why - MEDUZA RemixSupermode, MEDUZAdance/electronic
Lost In The RhythmDavid Guetta, MORTENdance
ExhalePatrick Bradleyjazz
As It WasThe Piano Guysclassical
Adagio in G Minor (Arr. for Harp and Orchestra)Xavier De Maistreclassical

Just by listening to these twenty songs, I notice that mose of them have a lot of instrumental elements, pretty much similar to "Idea 1".


In this project, I have successfully created a decent recommendation system that can recommend me songs using Non-negative Matrix Factorization (NMF). Other than this technique, there are other algorithms that can be used to create a recommendation system, such as Singular Value Decomposition (SVD), Alternating Least Squares (ALS), and Collaborative Filtering (CF). However, I chose NMF because it was the algorithm that I have never heard about until I was reading my old notes from my Linear Algebra class.

Am I satisfied with the result? Yes, I am, and it really help me discovered new songs that I have no idea about, and a lot of them ended up in my own playlists.


  1. D. Lee, Daniel. Seung, H. Sebastian. (1999). Learning the parts of objects by non-negative matrix factorization. Nature. 401. 788-791.
  2. Wikipedia. Non-negative matrix factorization.
  1. What is Non-negative Matrix Factorization (NMF)?
    • Data Collection
      • Data Exploration
        • Model Training
          • Evaluation
            • Conclusion
              • References