top of page

SONG GENRE CLASSIFIER

Built a k-nearest-neighbor classifier that guesses whether a song is hip-hop or country, using only the numbers of times words appear in the song’s lyrics. Our dataset was a table of songs, each with a name, an artist, and a genre. To predict a song’s genre, we had some attributes: the lyrics of the song, in a certain format. We had a list of approximately 5,000 words that might occur in a song. For each song, our dataset told us how frequently each of these words occur in that song. We classified a song by finding the ‘k’ songs in the training set that are most similar according to the features we selected. We called those songs with similar features the “neighbors”. The k-NN algorithm then assigned the new song to the most common category among its k neighbors. To implement this algorithm, we defined similarity between songs as the Euclidian distance between them when we plotted their features in a scatter diagram. For n different features, we simply computed the difference between corresponding feature values for two songs, squared each of the n differences, summed up the resulting numbers, and took the square root of the sum. The test-set accuracy of our classifier was 74.16%, with limited feature engineering. 

Note about the dataset used

This dataset was extracted from the Million Song Dataset and last.fm. The counts of common words in the lyrics for all of these songs are provided by the musiXmatch dataset (called a bag-of-words format). Only the top 5000 most common words are represented. For each song, we divided the number of occurrences of each word by the total number of word occurrences in the lyrics of that song. The Last.fm dataset contains multiple tags for each song in the Million Song Dataset. Some of the tags are genre-related, such as “pop”, “rock”, “classic”, etc. To obtain our dataset, we first extracted songs with Last.fm tags that included the words “country”, or “hip” and “hop”. These songs were then cross-referenced with the musiXmatch dataset, and only songs with musixMatch lyrics were placed into our dataset. Finally, inappropriate words and titles were removed, leaving us with 4976 words in the vocabulary and 1726 songs.

CONTACT ME

Dhruv Relwani

Software Engineer | Student | Leader

​

Phone:

+1 (510) 365-0041

 

Email:

dhruvrelwani@berkeley.edu

​
 

"Don't be a know-it-all, be a learn-it-all"

- Satya Nadella, CEO Microsoft

​

  • Black LinkedIn Icon
  • Black Facebook Icon
  • Black Twitter Icon
  • Black Instagram Icon

Success! Message received.

© 2025 By Dhruv Relwani.

bottom of page