I used the data to make this visualization in D3.js of the artists and genres in the featured playlists.
The artist names are arranged around the circle, and there is a line between a genre category and each artist in that genre.
There might be a future post for how to make this dendrogram, but let's not get ahead of ourselves quite yet!
First we need the data from the Spotify API.
I have a post on a similar concept of visualization, the arc diagram, which you can check out in the meantime as well.
API access credentials
You've got a Spotify account, right?
If not, go ahead and create one.
Then you need API access credentials.
- Login to the Spotify developer dashboard where you will see a button that says create an app.
- Click on the button to create an app, and go through the steps.
Once you've done that, you should have the following credentials:
These will both be alphanumeric strings.
Copy and paste them into a file for now.
import requests
client_id = 'xxxxxx'
client_secret = 'xxxxx'
Note: in this post I won't be going over best practices for storing and handling API credentials, but don't share these with anyone, don't commit them to version control, etc.
- We'll be using the requests library to connect to the API, so I've imported that here as well.
Get an access token
Now we'll use the client_id
and client_secret
access credentials to get authorization from the API.
Successful authorization will give us an access token which we will use to make API requests.
To get authorized, we have to send a POST request to the following URL with our access credentials.
auth_url = 'https://accounts.spotify.com/api/token'
That's the authorization URL, and then we can just put our credentials in a dictionary to send with the request.
data = {
'grant_type': 'client_credentials',
'client_id': client_id,
'client_secret': client_secret,
}
And send the request.
auth_response = requests.post(auth_url, data=data)
If all went well, the access token should be in the response.
access_token = auth_response.json().get('access_token')
The access token is another alphanumeric string that also has some other symbols in it.
Now we're ready to access the API and get the featured playlist data.
Spotify data
Ultimately to create this visualization, we need a couple of things:
- The artists in the featured playlists.
- The genres for each artist.
If you're familiar with Spotify at all, you know that they have a seemingly infinite number of genres.
When I created this visualization, there were over 400 genres for the 800+ artists in the featured playlists.
Just FYI, I grouped the genres into a manageable number for visualization purposes.
Data collection strategy
To get the artist and genre data, we need to hit 3 API endpoints.
- Get featured playlists.
- Playlists are made up of tracks, so next we get the tracks for each playlist.
- Get the artists for each track, which also gives us their genres.
Each of these objects has an ID, so we are using playlist IDs to get the list of tracks, and then the tracks contain artist IDs, which we then use to get details about the artists.
API base URL
Here is the base URL for the Spotify API.
base_url = 'https://api.spotify.com/v1/'
We will use it make all of the API requests by just concatenating the base URL with the endpoints to get each type of data.
The access token we generated earlier comes into play here, and we include it in the request as a header.
headers = {
'Authorization': 'Bearer {}'.format(access_token)
}
Now we're ready to fetch the data.
If you're new to making API requests, notice that we made a POST request earlier for the API authorization, but the rest of these requests will be GET requests.
Spotify Featured Playlists
The first request is to get the featured playlists.
featured_playlists_endpoint = 'browse/featured-playlists/?limit=50'
featured_playlists_url = ''.join([base_url,featured_playlists_endpoint])
And send the API request.
response = requests.get(featured_playlists_url,headers=headers)
The goal is to get all of the featured playlists, and you can get a maximum of 50 in one request.
We get a JSON response back, and there is a list of playlist objects in there.
playlists = response.json().get('playlists').get('items')
There were 12 featured playlists, and the JSON for one of them looks like this:
{'collaborative': False,
'description': 'The very best in new music from around the world. Cover: Taylor Swift',
'external_urls': {'spotify': 'https://open.spotify.com/playlist/37i9dQZF1DWXJfnUiYjUKT'},
'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWXJfnUiYjUKT',
'id': '37i9dQZF1DWXJfnUiYjUKT',
'images': [{'height': None,
'url': 'https://i.scdn.co/image/ab67706f000000032bbb3956cb803830d9b614e5',
'width': None}],
'name': 'New Music Friday',
'owner': {'display_name': 'Spotify',
'external_urls': {'spotify': 'https://open.spotify.com/user/spotify'},
'href': 'https://api.spotify.com/v1/users/spotify',
'id': 'spotify',
'type': 'user',
'uri': 'spotify:user:spotify'},
'primary_color': None,
'public': None,
'snapshot_id': 'MTYwNzY2Mjg1MSwwMDAwMDJmNTAwMDAwMTc2NTAyYzU0NTAwMDAwMDE3NjRmYzJhMWE0',
'tracks': {'href': 'https://api.spotify.com/v1/playlists/37i9dQZF1DWXJfnUiYjUKT/tracks',
'total': 100},
'type': 'playlist',
'uri': 'spotify:playlist:37i9dQZF1DWXJfnUiYjUKT'}
I'm only concerned with the playlist IDs, so I just looped through to get them.
playlist_ids = set()
for pl in playlists:
playlist_id = pl.get('id')
playlist_ids.add(playlist_id)
I used a set instead of a list to make sure there are no duplicates.
And we will use these IDs to get the tracks for each playlist.
Playlist tracks
We've got the list of playlist IDs from the previous request, and now we will loop through them and request the list of tracks for each playlist.
artist_ids = set()
for p_id in playlist_ids:
pr = requests.get(base_url + 'playlists/{}/tracks'.format(p_id), headers=headers)
pr_data = pr.json()
if pr_data:
playlist_data = pr_data.get('items')
for tr in playlist_data:
track = tr.get('track')
if track:
artists = track.get('artists')
for artist in artists:
artist_id = artist.get('id')
artist_ids.add(artist_id)
Here, artist_ids
is a set of artist IDs, which we will use to request the details about each artist.
Artist details
This is the last endpoint, which will get us the details for each artist ID, which is where we get their genres.
Depending on when you run this script, the featured playlists and therefore artists will likely be different, as well as the number of artists will also likely change.
I was looking at more than 800 artists, which could mean 800+ requests to the API, so I was a little worried about rate limiting.
In the docs, Spotify does not mention specific rate limits, but you will know if you've been rate limited if you start getting 429 response codes from your requests.
If you get rate limited, you can check the Retry-After response header to see how long you need to wait to retry your request.
Spotify has two endpoints to get the artist details.
- One endpoint gets details for a single artist.
- The other endpoint gets several artists, which you can use to request up to 50 artists at a time.
So we are going to use the second endpoint to get the artist details in chunks of 50.
I've got 845 artist IDs, so instead of making 845 requests, only need to make 17 requests.
Much better!
We're going to request the artist details and create a dictionary with the genres as keys, and then the values are sets of artists that are in that genre.
artist_genres = {}
artist_ids_list = list(artist_ids)
So first initialize the empty dictionary.
I've also converted the set of artist IDs from the previous request into a list so that I can slice it up into chunks of 50.
while artist_ids_list:
if len(artist_ids_list) > 50:
current_request = artist_ids_list[:50]
chunk = ','.join(artist_ids_list[:50])
get_artists_genres(chunk)
artist_ids_list = artist_ids_list[50:]
else:
current_request = artist_ids_list
artist_ids_list = []
This first block of code is breaking up the artist IDs into chunks of 50, and then passing the chunk to get_artists_genres
which will request the data from the API and parse the response.
chunk = ','.join(artist_ids_list[:50])
The API requires the IDs to be formatted as a comma-separated list of the artist IDs, so here I've just sliced the list and taken a chunk of 50 IDs, and then joined them together in string with each ID separated by a comma.
def get_artists_genres(chunk):
ar = requests.get(base_url + 'artists?ids={}'.format(chunk), headers=headers)
artists_data = ar.json().get('artists')
for artist in artists_data:
genre_data = artist.get('genres')
artist_name = artist.get('name')
for genre in genre_data:
if genre and genre not in artist_genres:
artist_genres[genre] = set([artist_name])
else:
artist_genres[genre].add(artist_name)
The API endpoint to request the artist details is
ar = requests.get(base_url + 'artists?ids={}'.format(chunk), headers=headers)
I used string formatting insert the string of artist IDs into the URL.
The rest of this function gets the artist name and the list of genres for the artist, and adds them to the dictionary.
As mentioned, I used the artist_genres
dictionary to create the visualization from earlier in D3.js.
Thanks for reading!
So that is an overview of working with the Spotify API in Python.
There is also a library called Spotipy that you can use to connect to the API and do the same things we did in this post.
But it's nice to also know what is going on under the hood of libraries like Spotipy!
Let me know if you have any questions or comments. Write them below, or reach out:
And let me know if you do anything cool with the Spotify API!