Recommendation System: Data Collection

Collecting metadata of my favorite songs from Spotify


  • Bijon Setyawan Raya

  • June 14, 2023

    2 mins


    Music Recommendation Sytem (4 Parts)


    In the first part of this series, I will be collecting approximately 2000 random songs and I would like to find out which songs / music I would like based on my playlists. Once I have the data, I will be using those data to make a recommendation system using Non-negative Matrix Factorization (NMF).

    My Spotify homepage
    My Spotify homepage

    Since I am a Spotify subscriber, I can use their API to get songs' metadata. Unfortunately, I had finished writing my own function after stumbling upon a framework called Spotipy that can be used to interact with Spotify API.

    Click here to see the script on GitHub.

    There are endpoints that I used to collect the metadata of songs in my own playlist and also songs in the playlists that I never listened to:

    1. https://accounts.spotify.com/authorize to get the access token
    2. https://accounts.spotify.com/api/token to get the refresh token
    3. https://api.spotify.com/v1/me/playlists to get all of my playlists
    4. https://api.spotify.com/v1/browse/categories/{category_id}/playlists to get all of the playlists in a category
    5. https://api.spotify.com/v1/playlists/{playlist['id']}/tracks to get all of the tracks in a playlist
    6. https://api.spotify.com/v1/tracks?ids={track_ids} to get the tracks' metadata in a bulk
    7. https://api.spotify.com/v1/audio-features?ids={track_ids} to get the tracks' audio features in a bulk

    Here is one of the examples of the metadata that Spotify API returns:

    {
      "id": "2i2gDpKKWjvnRTOZRhaPh2",
      "title": "Moonlight",
      "artist(s)": "Kali Uchis",
      "popularity": 88,
      "danceability": 0.639,
      "energy": 0.723,
      "key": 7,
      "loudness": -6.462,
      "mode": 0,
      "speechiness": 0.0532,
      "acousticness": 0.511,
      "instrumentalness": 0.0,
      "liveness": 0.167,
      "valence": 0.878,
      "tempo": 136.872,
      "type": "audio_features",
      "uri": "spotify:track:2i2gDpKKWjvnRTOZRhaPh2",
      "track_href": "https://api.spotify.com/v1/tracks/2i2gDpKKWjvnRTOZRhaPh2",
      "analysis_url": "https://api.spotify.com/v1/audio-analysis/2i2gDpKKWjvnRTOZRhaPh2",
      "duration_ms": 187558,
      "time_signature": 4
    }