
Please use this identifier to cite or link to this item:
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/18885
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Narang, Pratik | - |
dc.date.accessioned | 2025-05-08T08:57:08Z | - |
dc.date.available | 2025-05-08T08:57:08Z | - |
dc.date.issued | 2024-04 | - |
dc.identifier.uri | https://link.springer.com/article/10.1007/s11042-024-19160-5 | - |
dc.identifier.uri | http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/18885 | - |
dc.description.abstract | The essence of music is inherently multi-modal – with audio and lyrics going hand in hand. However, there is very less research done to study the intricacies of the multi-modal nature of music, and its relation with genres. Our work uses this multi-modality to present spectro-lyrical embeddings for music representation (SLEM), leveraging the power of open-sourced, lightweight, and state-of-the-art deep learning vision and language models to encode songs. This work summarises extensive experimentation with over 20 deep learning-based music embeddings of a self-curated and hand-labeled multi-lingual dataset of 226 recent songs spread over 5 genres. Our aim is to study the effects of varying the weight of lyrics and spectrograms in the embeddings on the multi-class genre classification. The purpose of this study is to prove that a simple linear combination of both modalities is better than either modality alone. Our methods achieve an accuracy ranging between 81.08% to 98.60% for different genres, by using the K-nearest neighbors algorithm on the multimodal embeddings. We successfully study the intricacies of genres in this representational space, including their misclassification, visual clustering with EM-GMM, and the domain-specific meaning of the multi-modal weight for each genre with respect to ’instrumentalness’ and ’energy’ metadata. SLEM presents one of the first works on an end-to-end method that uses spectro-lyrical embeddings without hand-engineered features. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer | en_US |
dc.subject | Computer Science | en_US |
dc.subject | Multi-modal music representation | en_US |
dc.subject | Spectro-lyrical embeddings for music (SLEM) | en_US |
dc.subject | Spectrogram analysis | en_US |
dc.title | Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM) | en_US |
dc.type | Article | en_US |
Appears in Collections: | Department of Computer Science and Information Systems |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.