dc.description.abstract |
Auto-tagging of images is important for image understanding and for tag-based applications viz. image retrieval, visual question-answering, image captioning, etc. Although existing tagging methods incorporate both visual and textual information to assign/refine tags, they lag in tag-image relevance, completeness, and preciseness, thereby resulting in the unsatisfactory performance of tag-based applications. In order to bridge this gap, we propose a novel framework for tag assignment using knowledge embedding (TAKE) from a proposed external knowledge base, considering properties such as Rarity, Newness, Generality, and Naturalness (RNGN properties). These properties help in providing a rich semantic representation to images. Existing knowledge bases provide multiple types of relations extracted through only one modality, either text or visual, which is not effective in image related applications. We construct a simple yet effective Visio-Textual Knowledge Base (VTKB) with only four relations using reliable resources such as Wikipedia, thesauruses, dictionaries, etc. Our large scale experiments demonstrate that the proposed combination of TAKE and VTKB assigns a large number of high quality tags in comparison to the ConceptNet and ImageNet knowledge bases when used in conjunction with TAKE. Also, the effectiveness of knowledge embedding through VTKB is evaluated for image tagging and tag-based image retrieval (TBIR). |
en_US |