Maio 2020 vol. 3 num. 1 - XIX Simpósio de Pesquisa Operacional & Logística da Marinha
Artigo Completo - Open Access.
OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH
OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH
Bedo, Marcos V. N. ; Oliveira, Rodolfo A. de ; Kaster, Daniel S. ; Oliveira, Daniel C. M. de ; Traina Jr., Caetano ;
Artigo Completo:
Similarity searching supports several computational tasks, such as classi- cation and content-based retrieval. A plethora of indexes has been proposed aiming at enhancing similarity queries, being the Omni-family one of the most versatile. The main strength of Omni methods is they handle the data elements regarding a small set of carefully selected pivots. In this study, we improve the Omni-family and create a new class of indexes called Omni-histograms. Our approach summarizes distance distributions to Omni-pivots in such a way histograms' buckets are also employed for the partitioning of the search space into disjoint regions. The resulting structures boost query executions by using the frequency within each region for limiting both disk accesses and distance calculations. Experiments on real-world datasets showed our approach outperforms existing methods in up to 113%.
Artigo Completo:
Similarity searching supports several computational tasks, such as classi- cation and content-based retrieval. A plethora of indexes has been proposed aiming at enhancing similarity queries, being the Omni-family one of the most versatile. The main strength of Omni methods is they handle the data elements regarding a small set of carefully selected pivots. In this study, we improve the Omni-family and create a new class of indexes called Omni-histograms. Our approach summarizes distance distributions to Omni-pivots in such a way histograms' buckets are also employed for the partitioning of the search space into disjoint regions. The resulting structures boost query executions by using the frequency within each region for limiting both disk accesses and distance calculations. Experiments on real-world datasets showed our approach outperforms existing methods in up to 113%.
Palavras-chave: Similarity search; access methods; histograms; k-NN queries; pivot selection.,
Palavras-chave: Similarity search; access methods; histograms; k-NN queries; pivot selection.,
DOI: 10.5151/spolm2019-136
Referências bibliográficas
- [1] SILVA, Y. N. et al. Similarity queries: their conceptual evaluation, transformations, and processing. VLDB, Springer, v. 22, n. 3, p. 395{420, 2013. 2, 3 [2] YAO, B.; LI, F.; KUMAR, P. k-Nearest Neighbor queries and kNN-Joins in large relational databases (almost) for free. In: International Conference on Data Engineering. [S.l.]: IEEE, 2010. p. 4{15. 2, 3 [3] ZEZULA, P. Future trends in similarity searching. In: International Conference on Similarity Search and Applications. [S.l.]: Springer, 2012. p. 8{24. 2 [4] ALY, A. M.; AREF, W. G.; OUZZANI, M. Spatial Queries with kNN and Relational Predicates. In: International Conference on Advances in Geographic In- formation Systems. [S.l.]: ACM, 2015. p. 28:1{28:10. 2 [5] WU, D.; CONG, G.; JENSEN, C. S. A framework for ecient spatial web object retrieval. The VLDB Journal, Springer, v. 21, n. 6, p. 797{822, 2012. 2 [6] CIACCIA, P.; PATELLA, M.; ZEZULA, P. M-tree: An ecient access method for similarity search in metric spaces. In: International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 1997. p. 426{435. 2, 4 [7] SKOPAL, T.; POKORNY, J.; SNA SEL, V. Nearest neighbours search using the PM-tree. In: International Conference on Database Systems for Advanced Applications. [S.l.]: Springer, 2005. p. 803{815. 2 [8] TRAINA JR., C. et al. The Omni-family of all-purpose access methods: A simple and eective way to make similarity search more ecient. The VLDB Journal, Springer, v. 16, n. 4, p. 483{505, 2007. 2, 3, 4 [9] CHAVEZ, E. et al. Searching in metric spaces. ACM Computing Surveys, ACM, v. 33, n. 3, p. 273{321, 200 2, 3 [10] MOVSKO, J.; JAKUB, L.; SKOPAL, T. Clustered Pivot Tables for I/Ooptimized Similarity Search. In: International Conference on Similarity Search and Applications. [S.l.]: ACM, 201 p. 17{24. 2, 3 [11] GRAEFE, G. Modern B-Tree Techniques. Foundations and Trends in Databases, Now Publishers, Inc., v. 3, n. 4, p. 203{402, 201 2 [12] GUTTMAN, A. R-trees: A Dynamic Index Structure for Spatial Searching. ACM Special Interest Group on Management of Data. 2 [13] BEDO, M. V. N. et al. Beyond Hit-or-Miss: A Comparative Study of Synopses for Similarity Searching. Journal of Information and Data Management, SBC, v. 9, n. 1, p. 36{51, 2018. 2, 5 [14] TASAN, M.; OZSOYOGLU, Z. M. Improvements in distance-based indexing. In: Scientic and Statistical Database Management Conference. [S.l.]: IEEE, 2004. p. 161{170. 2, 3, 4, 5, 6 [15] VIEIRA, M. R. et al. Boosting knn queries by estimating suitable query radii. In: Scientic and Statistical Database Management Conference. [S.l.]: IEEE, 2007. p. 10. 2, 4, 6 [16] BUSTOS, C. et al. An empirical evaluation of intrinsic dimension estimators. In: International Conference on Similarity Search and Applications. [S.l.]: Springer, 2015. p. 125{137. 2 [17] PESTOV, V. Indexability, Concentration, and VC Theory. In: International Conference on Similarity Search and Applications. [S.l.]: ACM, 2010. p. 3{12. 2 [18] HETLAND, M. L. The basic principles of metric indexing. In: Swarm Intel- ligence for Multi-objective Problems in Data Mining. [S.l.]: Springer, 2009. p. 199{232. 3 [19] ZEZULA, P. et al. Similarity Search: The Metric Space Approach. [S.l.]: Springer, 2010. v. 3, 5 [20] BoHM, C.; KREBS, F. The k-Nearest Neighbour Join: Turbo Charging the KDD Process. KIS, Springer, v. 6, n. 6, p. 728{749, 2004. 3 [21] HJALTASON, G. R.; SAMET, H. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, ACM, v. 28, n. 4, p. 517{580, 2003. 4, 5 [22] IOANNIDIS, Y. The history of histograms. In: International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 2003. p. 19{30. 5 [23] KONIG, A. C.; WEIKUM, G. A framework for the physical design problem for data synopses. In: International Conference on Extending Database Technology. [S.l.: s.n.], 2002. p. 627{645. 5
Como citar:
Bedo, Marcos V. N.; Oliveira, Rodolfo A. de; Kaster, Daniel S.; Oliveira, Daniel C. M. de; Traina Jr., Caetano; "OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH", p. 1890-1905 . In: Anais do XIX Simpósio de Pesquisa Operacional & Logística da Marinha.
São Paulo: Blucher,
2020.
ISSN 2175-6295,
DOI 10.5151/spolm2019-136
últimos 30 dias | último ano | desde a publicação
downloads
visualizações
indexações