Spotify playlist maker from liked songs
Separate Your Saved Songs on Spotify into Playlists of Similar SongsExploring Audio Features, classifying into moods and building a Machine Learning ApproachChingis Oinar Show
Oct 2, 2020·9 min read A few weeks ago I came across with an interesting article titled A Music Taste Analysis Using Spotify API and Python. In this article, an author tries to analyze not only his but also a fiancées preference to determine what the data has to say about this. Thus, he compares two different profiles in terms of music features, which are provided by Spotifys API, simultaneously. While reading it, I was curious to not only analyze my own preference but also to play with my Spotify data. Therefore, I wanted to cluster my saved songs on Spotify into separate playlists that would represent a specific mood I have while listening to them.
Initially, it is worth mentioning that likewise Twitter, Slack, and Facebook Spotify offers an API for developers to explore their music database and get insights into our listening habits. It provides a large variety of features; however, I used 8 features describing a song.
For more information on different features provided: Get Audio Features for a Track | Spotify for Developers Experimental setupNow when I mentioned what each feature represents, I want to highlight what dataset I used for my clustering task. Clusters are going to be derived using the KMeans clustering algorithm, which was trained on Spotify Dataset 19212020 found on Kaggle. Spotify Dataset 19212020 contains more than 160 000 songs collected from Spotify Web API, and also you can find data grouped by artist, year, or genre in the data section.
Considering that all other features mentioned have values in a range of [0,1], it was important to scale Loudness value which ranges from 0 down to -60 dB. Thus, I normalised the Loudness feature using the sklearn MinMaxScaler to bring its values between 0 and 1. from sklearn import preprocessingscaler=preprocessing.MinMaxScaler() #instantiate a scaler #all the feature values are in the range [0,1] ,except of loudnes #so let's scale it to fit the exact same rangeloudness=X["loudness"].values loudness_scaled=scaler.fit_transform(loudness.reshape(-1, 1)) Tuning K-Means Clustering AlgorithmI ended up using the K-Means Clustering Algorithm, which is used to determine distributions in data. It is an unsupervised learning algorithm that groups similar data points into k groups by calculating distances to centroids. To attain that goal it looks for a predefined number (k) of clusters. In order to come up with an appropriate k value, I used a well-known method called Elbow Method. Thus, by running K-Means for a range of k (e.g. 120), we can obtain the following figure: If we look at the scree plot as a mountain, we need to figure out a point where our mountain ends and rubble begins since up until that point each cluster I add there is a substantial decrease in the variance and after that point, there is a marginal decrease. Therefore, it looks like a value of 5 is optimal for my case. However, it is not fully vivid, so sometimes it can be up to a person to decide. Describing and Visualising the Clusters using PCAConsidering that there are 8 different features used for our clustering task, it is extremely hard to visualize out data. Therefore, there are some common techniques used in order to reduce the dimensionality of our data. I used Principal Component Analysis (PCA) that reduces dimensionality and still keeps important information. PCA is equally used to speed up a learning process; however, I am not going to dive into details in this article. The figure above shows our 5 clusters represented in 2-dimensional space. It is vivid that cluster 0 can be quite difficult to be differentiated from cluster 2 and 1. Meanwhile, there are lucid differences among other clusters derived.
Identifying an optimal classifierConsidering that we obtained labels as our clusters, we can easily implement a classification algorithm that will help classify our saved songs on Spotify. Furthermore, it will allow us to classify recommended songs and separate them into different playlists. X_train[:,5]=scaler.fit_transform(X_train[:,5].reshape(-1, 1)).reshape(-1,) X_test[:,5]=scaler.transform(X_test[:,5].reshape(-1, 1)).reshape(-1,) There are four models compared in terms of accuracy score, which are K-Neighbors Classifier, Random Forest Classifier, Support Vector Classifier and Naive Bayes. Support Vector Classifier turned out to be the best model in terms of accuracy score, which made up roughly 0.998, hence we will be using it for future classification. Predicting my saved songs on SpotifySpotifys API provides a set of useful functions. Thus, we are able to obtain a dataset of all the songs we saved. Therefore, I will classify my liked songs using the classifier we mentioned above. Finally, we will separate my songs into 5 different playlists and try to analyze them. offset = 0songs = [] names = [] ids = [] while True: content = sp.current_user_saved_tracks(limit=50, offset=offset) songs += content['items'] if content['next'] is not None: offset += 100 else: break for i in songs: names.append(i['track']['name']) ids.append(i['track']['id']) index=0 audio_features=[] while index < len(ids): audio_features += sp.audio_features(ids[index:index + 50]) index += 50 features_list = [] for features in audio_features: features_list.append([features['acousticness'], features['danceability'], features['liveness'],features['energy'], features['instrumentalness'], features['loudness'], features['speechiness']]) mydf = pd.DataFrame(features_list, columns=["acousticness", "danceability", "liveness","energy", "instrumentalness", "loudness", "speechiness"],index=ids) The figure attached above shows my liked songs being classified using SVC. Finally, I listened to a list of songs from each cluster and came up with the following interpretation:
Examples:
Examples: Interestingly, all songs of Shawn Mendes I have fell into this category.
Examples: Almost all songs of Joji I saved fell into this category, which is fair enough in my opinion.
Examples:
Examples:
The figure above demonstrates mean values for all the features used from the dataset of my liked songs. Therefore, I infer that I usually listen to songs that are high in danceability and energy. I have never thought of that but now I realize that it is indeed true. However, I also listen to romantic, perhaps sad, songs from time to time. The pie chart above proves that by showing that exactly 80% of my liked songs fall into the category of energetic/dance songs. Finally, I sorted my songs into 5 different playlists representing these categories. clustered_songs=list(zip(mydf.index,mydf.iloc[:,-1]))sorted_songs=[[],[],[],[],[]]for i in range(len(clustered_songs)): sorted_songs[clustered_songs[i][1]].append(clustered_songs[i][0])playlists=[] for i in range(5): playlist_created=sp.user_playlist_create(username, name="Cluster "+str(i), public=False,description='')#create playlists for a corresponding cluster sp.user_playlist_add_tracks(username,playlist_created['id'],sorted_songs[i][:100]) #add new songs playlists.append(playlist_created) Classifying recommended songsAs a final step of our experiment, I will request Spotify recommendations that are generated based on the songs I saved. rec_tracks = []for i in mydf.index.tolist(): rec_tracks += sp.recommendations(seed_tracks=[i], limit=5)['tracks'] #get recommendations from Spotify rec_track_ids = [] rec_track_names = [] for i in rec_tracks: rec_track_ids.append(i['id']) #extract id and name of songs rec_track_names.append(i['name']) rec_features = [] for i in range(0,len(rec_track_ids)): rec_audio_features = sp.audio_features(rec_track_ids[i]) #extract features for track in rec_audio_features: rec_features.append(track) rec_playlist_df = pd.DataFrame(rec_features, index = rec_track_ids) #make a dataframe Finally, I classify all songs I obtained and update my created playlists by adding new recommended songs. Lets see if the songs obtained fit our playlists.
Cluster 0: Cluster 1: Interestingly, there are a couple more songs of Crhistian French were added. I think that is his style. Cluster 2: Even though, it is a rap song, the style and the music fits the playlist. Cluster 3: Cluster 4: Note: the feature that makes this cluster to be different from the other two listed above is its low speechiness. ConclusionTo wrap up, the recommendations were generated based on the dataset of all my saved tracks (not on the individual playlist). Therefore, I believe a classifier could separate them into 5 different playlists pretty well. I think this work is pretty interesting and good practice for people who have just started their journey since it includes various fields of ML simultaneously. I this work I trained KMeans clustering algorithm on a public Spotify Dataset. Additionally, I came up with a classifier that helped me sort my liked songs into playlists each representing a distinct style/mood. The source code is available on my Github. Links to my playlists created
References: Redirecting you - MediumEdit descriptionmedium.com Is my Spotify music boring? An analysis involving music, data, and machine learningUsing Spotifys audio features API, data, and machine learning, I investigated how boring my saved songs are.towardsdatascience.com Extracting Spotify data on your favourite artist via PythonSpotify is one of the most popular streaming platforms in the world. They also have an API for developers to utilisemedium.com Get Audio Features for a Track | Spotify for DevelopersPath Parameters Authorization Required. A valid access token from the Spotify Accounts service: see the Web APIdeveloper.spotify.com |