Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining

Goyal, Poonam; Goyal, Navneet; Challa, Jagat Sesh

DSpace Home
→
BITS Faculty Publications
→
Department of Computer Science and Information Systems
→
View Item

dc.contributor.author	Goyal, Poonam
dc.contributor.author	Goyal, Navneet
dc.contributor.author	Challa, Jagat Sesh
dc.date.accessioned	2022-12-27T06:40:38Z
dc.date.available	2022-12-27T06:40:38Z
dc.date.issued	2020-04
dc.identifier.uri	https://link.springer.com/article/10.1007/s41060-020-00208-2
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/xmlui/handle/123456789/8149
dc.description.abstract	The use of multi-dimensional indexing structures has gained a lot of attention in data mining. The most commonly used data structures for indexing data are R-tree and its variants, quad-tree, k-d-tree, etc. These data structures support region queries (point, window and neighborhood queries) and nearest neighbor queries. These queries are extensively used in data mining algorithms. Although these data structures facilitate execution of the above queries in logarithmic time, the constraints associated with them become bottleneck in query execution, when used for large and high-dimensional datasets. Moreover, these indexing structures do not cater to specific data access patterns of data mining algorithms. In this paper, we propose a new data structure Grid-R-tree, a grid based R-tree which is specifically designed to address the querying requirements of multiple data mining algorithms. Grid-R-tree is a simple, yet effective adaptation of R-tree using the concept of Grid. We also introduce a new query over Grid-R-tree, called cell-wise epsilon neighborhood query (CellWiseNBH), which captures the locality in query execution pattern of density-based clustering algorithms, and enables us to redesign them for improving their efficiency. Our theoretical and experimental analysis shows that the proposed data structure outperforms the conventional R-tree in terms of neighborhood and nearest neighbor queries. The experiments were conducted on datasets of size up to 100 million and dimensionality up to 74. The results also suggest that Grid-R-tree improves the efficiency of data mining algorithms such as k-nearest neighbor classifier and DBSCAN clustering (including the redesigned version that uses CellWiseNBH). Additionally, an adaptive grid optimization has been applied on dense cells that have number of indexed data points greater than a threshold τ to keep equal load distribution in the cells, which resulted in more efficient query performance for datasets that have skewed distribution of data points.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.subject	Computer Science	en_US
dc.subject	Data Mining	en_US
dc.subject	Grid-R-tree	en_US
dc.title	Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Computer Science and Information Systems [1099]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account