DSpace logo

Please use this identifier to cite or link to this item: http://dspace.bits-pilani.ac.in:8080/jspui/xmlui/handle/123456789/8136
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGoyal, Poonam-
dc.contributor.authorGoyal, Navneet-
dc.date.accessioned2022-12-26T10:19:52Z-
dc.date.available2022-12-26T10:19:52Z-
dc.date.issued2018-
dc.identifier.urihttps://ideas.repec.org/a/ids/ijdmmm/v10y2018i2p127-170.html-
dc.identifier.urihttp://dspace.bits-pilani.ac.in:8080/xmlui/handle/123456789/8136-
dc.description.abstractClustering documents is an essential step in improving efficiency and effectiveness of information retrieval systems. We propose a two-phase split-merge (SM) algorithm, which can be applied to topical clusters obtained from existing query-context-aware document clustering algorithms, to produce soft topical document clusters. The SM is a post-processing technique which combines the advantages of document and feature-pivot topical document clustering approaches. The split phase splits the topical clusters by relating them to the topics obtained by disambiguating web search results, and converts them into homogeneous soft clusters. In the merge phase, similar clusters are merged by feature-pivot approach. The SM is tested on the outcome of two hierarchical query-context aware document clustering algorithms on different datasets including TREC session-track 2011 dataset. The obtained topical-clusters are also updated by an incremental approach with the progress in the data stream. The proposed algorithm improves the quality of clustering appreciably in all the experiments conducted.en_US
dc.language.isoenen_US
dc.publisherInder Scienceen_US
dc.subjectComputer Scienceen_US
dc.subjectClusteringen_US
dc.titleTopical document clustering: two-stage post processing techniqueen_US
dc.typeArticleen_US
Appears in Collections:Department of Computer Science and Information Systems

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.