Recommendation System: Data Collection
Collecting metadata of my favorite songs from Spotify
Bijon Setyawan Raya
June 14, 2023
2 mins
Music Recommendation Sytem (4 Parts)
Data Collection
Data Preprocessing
System Training
System Evaluation
In the first part of this series, I will be collecting approximately 2000 random songs and I would like to find out which songs / music I would like based on my playlists. Once I have the data, I will be using those data to make a recommendation system using Non-negative Matrix Factorization (NMF).

Since I am a Spotify subscriber, I can use their API to get songs' metadata. Unfortunately, I had finished writing my own function after stumbling upon a framework called Spotipy that can be used to interact with Spotify API.
Click here to see the script on GitHub.
There are endpoints that I used to collect the metadata of songs in my own playlist and also songs in the playlists that I never listened to:
https://accounts.spotify.com/authorize
to get the access tokenhttps://accounts.spotify.com/api/token
to get the refresh tokenhttps://api.spotify.com/v1/me/playlists
to get all of my playlistshttps://api.spotify.com/v1/browse/categories/{category_id}/playlists
to get all of the playlists in a categoryhttps://api.spotify.com/v1/playlists/{playlist['id']}/tracks
to get all of the tracks in a playlisthttps://api.spotify.com/v1/tracks?ids={track_ids}
to get the tracks' metadata in a bulkhttps://api.spotify.com/v1/audio-features?ids={track_ids}
to get the tracks' audio features in a bulk
Here is one of the examples of the metadata that Spotify API returns:
{
"id": "2i2gDpKKWjvnRTOZRhaPh2",
"title": "Moonlight",
"artist(s)": "Kali Uchis",
"popularity": 88,
"danceability": 0.639,
"energy": 0.723,
"key": 7,
"loudness": -6.462,
"mode": 0,
"speechiness": 0.0532,
"acousticness": 0.511,
"instrumentalness": 0.0,
"liveness": 0.167,
"valence": 0.878,
"tempo": 136.872,
"type": "audio_features",
"uri": "spotify:track:2i2gDpKKWjvnRTOZRhaPh2",
"track_href": "https://api.spotify.com/v1/tracks/2i2gDpKKWjvnRTOZRhaPh2",
"analysis_url": "https://api.spotify.com/v1/audio-analysis/2i2gDpKKWjvnRTOZRhaPh2",
"duration_ms": 187558,
"time_signature": 4
}