comment 0

Can you predict the composer just by ‘listening’ to a song?

A bit of a background.

Being part of the millennials generation, yours truly grew up listening to songs from the 1990’s and early 2000’s. And his ‘back then’ peter skills was anything but peter. So no English songs and all. But his Tamil song knowledge was earth shattering. An example of the prowess was in display during Christmas celebrations in Germany. The trip intended to help Sriram and Chitra move homes from Munster to Frankfurt, turned out into a ‘friends meeting after a long time and reminiscing old times’ event. And as part it, Sriram came up a game, where one utters a line of a random song and the other finds out the movie and its stars. His opening challenge was jumbo ithu kadhal seiyum neram. And yours truly didn’t bat an eyelid in identifying the song from rishi, starring a topless sharath kumar and a white cessna being dragged along unwillingly.

Another friend, Girish, has a theory about the music from the said period. Deva copies from ARR, SA Rajkumar from Deva, and Sirpi copies from SA Rajkumar! But back then there were no cool algorithms/tools at our disposal, other than good ol ears, to validate this claim. But here I am almost 15 years of the claim, trying to evaluate the merit behind the claim.

Groundwork for this analysis started way back during the floods. But it took the master procrastinator more than a year to finally get it done. The underlying idea was to decompose an audio into a flat file, and hope to learn anything useful from it. Perhaps it could be used to decipher the idiosyncrasies of composers, if any. Should the models spot such idiosyncrasies, it could be used to trace back the composer.

The data comprises of kollywood songs from a bunch of randomly selected movies, ranging from the early 90’s to present day. The distribution of composers in this list of songs may not be a reflection of the film industry, but more of personal preference.

Side questions:

  • Is there a way to determine or predict if a user (in this case, me. But potentially, it could be anyone) would like a song even before he/she listens to it?
  • Has AR Rahman’s music changed over the years?
  • Are composers similar to each other?
  • Could we, in anyway, spot composers who are known to plagarize?
  • Who the heck is the composer of the hugely enjoyable ‘Maruvaarthai‘ from Enai Noki Paayum Thota, 2017?

The data

The data comprises of around 110 songs, comprising of different composers, genres and (sort of) eras. Some of the songs in the list are from:

  1. Kaaviya Thalaivan (2014) ARR
  2. 36 vayadhinile (2015) Santosh Narayanan
  3. AYM (2016) ARR
  4. Agni Natchatiram (1988) Ilayaraja

Overall, the number of songs featuring each composer is shown below

composer_distribution

Labels are assigned for each song: “y”, “o” or “n”, denoting yes, ok or no, respectively, denoting the author’s liking of that song. The distribution of that looks as shown: The list has 47 songs or 45.19% of the songs to the author’s liking.

liking_distribution

The “liking” for each composer is shown below

composer_liking_distribution

Data preprocessing

Audio files show the change in signal over time, so its a time domain representation. While time-domain analysis shows how a signal changes over time, frequency-domain analysis shows how the signal’s energy is distributed over a range of frequencies. A frequency-domain representation also includes information on the phase shift that must be applied to each frequency component in order to recover the original time signal with a combination of all the individual frequency components. We do this time domain <=> frequency domain swap by using the Fourier transform.

By now, one would expect the reader to have a full grasp of the author’s exceptional laziness, and the smart ones have probably figured out that this analysis won’t include phasing, though including it would improve the predictive accuracy. This is how a ‘processed’ audio file looks like.

thalli_pogathey

Hard core Tamil songs enthusiasts would be interested to see how various composers’ work stack against each other. This is depicted in the following plot. It has to be observed that this is the “average” of all the composers, and as such, may not have that much of a physical meaning. But helps compare each of the composer with the other.avg_of_all

(Click on the plot for an enlarged version. X-axis is the frequency range of the average human, 20 – 20000 Hz)

Adding some relevant data/metadata like the movie, year of release, composer and the degree of likability of the song (y,o,n) completes the training data.

Learning and Predictions

Some technical mumbo jumbo: Build ensemble models on the training data. The data is wide (40005 features) and short (104 rows) and hence susceptible to overfitting. CV to help minimize its effects.

What are the primary indicator of a composer?

director_predictor_with_year

Year of the song/movie might seem like an obscure indicator of the composer, but would start making sense when we look back at some old timey legends like Deva and SA Rajkumar clearing the way for new ones like ARR during late 90’s and early 2000’s.

The performance of the training set looks too good to be true at 100%. This, in machine learning parlance, is termed overfitting.

director_prediction_with_year_train

Risking low performance on unseen data (due to overfitting), we expose the model to the test data (that has 11 songs, 5 from AYM (2016,ARR) and Iruvar (ARR, 1998), and one from Ennai Nokki Payum Thotta (2017, Unknown))

director_prediction_with_year_test

Despite overfitting, the model’s test set is at a respectable 72.5%. As discussed earlier, adding the phase component to the data would help improve this, so would feature engineering, dimensionality reduction.

Shifting focus to the next set of labels, likability of the songs, it’d be interesting to note if there are any “sweet spots” in the spectrum that resonates with your’s truly. These seem to be, and these are the those: (I had to remove the composer and the year, as they were weighing heavily on the outcomes!)

hit_predictor

So it looks like the frequencies of 2930 Hz and 2676.5 Hz is where the author’s sweet spot is.

Since training set reveal the suspected overfit results, it could be suspected that the test set performance would reflects the same.

like_prediction

True it is. The predictive accuracy has gone down significantly from earlier result. As alluded to before, a set of standard techniques could be applied to improve this score.

The side questions

  • Has the music of ARR changed over the years?

The answer to this is both visual and statistical.

Lets do the visual first, by comparing 2 of ARR’s songs from different eras.

avalum_naanum

hallo_mr_ethirkkatchi

 

 

 

 

 

 

 

This contrasts ARR’s style in 1998 and 2016. It could be argued that these 2 songs might not be representative of all of his works during those eras. Juxtaposing ARR’s yearly ‘averages’ would help settle this debate.

arr_years_trends

This shows the evolution of ARR’s work- so a clear and visibly (if not aurally) noticeable change over the years.

  • Are composers similar to each other and could we, in anyway, spot composers who are known to plagarize?

This is gonna need an indirect way of answering. Earlier, it was observed that the year of release was an important feature in predicting the composer of the song. Well, since at its core, all learning models look for patterns, and it groups similar “patterns” under same labels, if a model gets “confused” or wrongly classify one composer to another, based on the spectrum, it indirectly tells us that those compositions (composers) are similar.

With that in mind, this is how the results look without the year of release.

confusion1

Remembering the test set has 11 ARR songs, this shows that the model is wrongly predicting (confused!) actual ARR songs with that of Harris and Deva. As important as this is to the plot, this needs a revisit later in the section.

So what happens when including the year of release:

confusion2

We now notice that the model confuses ARR with (only) Yuvan

What do we infer from this?

  • There seem to be strong underlying similarities between compositions of ARR, Deva and Harris. This could be coincidence or otherwise. I’ll encourage the reader to infer from this.
  • Yuvan seem to produce some of the ARR magic 🙂
  • Who the heck is the composer of ennai nokki payum thotta?

Probably the first time in Kollywood history, the name of the composer is held a suspense. Our reasonably performing model may have some insights. But a shortcoming of learning models is the ability to ‘learn’ beyond the training data. In this context, if the training data doesn’t have even a single work of composer in it, the model won’t ‘know’ if that unseen composer’s music happen to be in the test set. The model will pick the composer that it associates the most with the frequency patterns of the piece. So an implicit assumption is that the composer of the song is in the training data already. This may or may not be valid, depending on who the actual composer is. Only time will tell.

With the assumption out of the way, visual inspection leads to the first clue: The pattern of this piece looks quite similar to other ARR compositions of recent times. There are ‘some’ similarities to Yuvan too, but not as much as ARR’s.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

top left = Thalli Pogathey (AYM 2016)

top right = Avalum Nannum (AYM 2016)

bottom left = Netru Aval (Maryaan 2013)

bottom right = Maruvaarthai (Ennai nokki payum thotta 2017)

Of the composers, ARR seemed the closest ‘visual’ match to this new song.

While letting the model developed earlier on this song, the prediction was… er, Deva! And its now known that when not considering year of release, the model has a track record of confusing Deva with ARR. So the prediction can be ‘provisionally’ ARR. When considering the year of release, the model’s prediction was ARR. But it should be kept in mind that this model ‘confuses’ ARR and Yuvan.

So in the limiting case that the actual composer was in the list of composers above, and continue to compose in their ‘usual’ style without much change in their idiosyncrasies, the music for ennai nokki payum thotta is likely to involve ARR, but with a possibility of Yuvan being the composer too.

What did we learn from this?

Girish was right about the Deva taking stuff off ARR. Too bad, there were no data for Sirpi and co.

Yours truly seem to have disposition towards 2930 and 2676.5 Hz.

It is possible to transform an audio into flat data, a bunch of which could then be used as a training data to find patterns that reveal the idiosyncrasies of modern Tamil music directors/composers, with good accuracy.

We can also use the data to identify personal preferences, what makes someone like a song, and can use that information to predict if the user would like a hear (before he/she actually listens to it).

Along the way we explored some arguably “similar” compositions, raising questions about the said similarities.

Instead of actually using it, the lazy author prescribes methods (like including the phase components of the audio file, dimensionalality reduction, feature engineering, etc. ) to boost the overall accuracy.

It’s high probabable that the composer of Ennai Nokki Payum Thotta is ARR (though we cannot completely rule out Yuvan yet)

Leave a Reply