DSpace Repository

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Show simple item record

dc.contributor.author Narang, Pratik
dc.date.accessioned 2025-05-08T08:57:08Z
dc.date.available 2025-05-08T08:57:08Z
dc.date.issued 2024-04
dc.identifier.uri https://link.springer.com/article/10.1007/s11042-024-19160-5
dc.identifier.uri http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/18885
dc.description.abstract The essence of music is inherently multi-modal – with audio and lyrics going hand in hand. However, there is very less research done to study the intricacies of the multi-modal nature of music, and its relation with genres. Our work uses this multi-modality to present spectro-lyrical embeddings for music representation (SLEM), leveraging the power of open-sourced, lightweight, and state-of-the-art deep learning vision and language models to encode songs. This work summarises extensive experimentation with over 20 deep learning-based music embeddings of a self-curated and hand-labeled multi-lingual dataset of 226 recent songs spread over 5 genres. Our aim is to study the effects of varying the weight of lyrics and spectrograms in the embeddings on the multi-class genre classification. The purpose of this study is to prove that a simple linear combination of both modalities is better than either modality alone. Our methods achieve an accuracy ranging between 81.08% to 98.60% for different genres, by using the K-nearest neighbors algorithm on the multimodal embeddings. We successfully study the intricacies of genres in this representational space, including their misclassification, visual clustering with EM-GMM, and the domain-specific meaning of the multi-modal weight for each genre with respect to ’instrumentalness’ and ’energy’ metadata. SLEM presents one of the first works on an end-to-end method that uses spectro-lyrical embeddings without hand-engineered features. en_US
dc.language.iso en en_US
dc.publisher Springer en_US
dc.subject Computer Science en_US
dc.subject Multi-modal music representation en_US
dc.subject Spectro-lyrical embeddings for music (SLEM) en_US
dc.subject Spectrogram analysis en_US
dc.title Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM) en_US
dc.type Article en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account