Simpósio de Pesquisa Operacional e Logística da Marinha - Publicação Online
- Todas as edições
- Última edição
- Equipe de Produção
- ISSN 2175-6295
OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH
OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH
Bedo, Marcos V. N.; Oliveira, Rodolfo A. de; Kaster, Daniel S.; Oliveira, Daniel C. M. de; Traina Jr., Caetano
Artigo Completo:
Similarity searching supports several computational tasks, such as classi- cation and content-based retrieval. A plethora of indexes has been proposed aiming at enhancing similarity queries, being the Omni-family one of the most versatile. The main strength of Omni methods is they handle the data elements regarding a small set of carefully selected pivots. In this study, we improve the Omni-family and create a new class of indexes called Omni-histograms. Our approach summarizes distance distributions to Omni-pivots in such a way histograms' buckets are also employed for the partitioning of the search space into disjoint regions. The resulting structures boost query executions by using the frequency within each region for limiting both disk accesses and distance calculations. Experiments on real-world datasets showed our approach outperforms existing methods in up to 113%.
Similarity searching supports several computational tasks, such as classi- cation and content-based retrieval. A plethora of indexes has been proposed aiming at enhancing similarity queries, being the Omni-family one of the most versatile. The main strength of Omni methods is they handle the data elements regarding a small set of carefully selected pivots. In this study, we improve the Omni-family and create a new class of indexes called Omni-histograms. Our approach summarizes distance distributions to Omni-pivots in such a way histograms' buckets are also employed for the partitioning of the search space into disjoint regions. The resulting structures boost query executions by using the frequency within each region for limiting both disk accesses and distance calculations. Experiments on real-world datasets showed our approach outperforms existing methods in up to 113%.
Palavras-chave:
DOI: 10.5151/spolm2019-136
Referências bibliográficas
- [1] SILVA, Y. N. et al. Similarity queries: their conceptual evaluation, transformations, and processing. VLDB, Springer, v. 22, n. 3, p. 395{420, 2013. 2, 3 [2] YAO, B.; LI, F.; KUMAR, P. k-Nearest Neighbor queries and kNN-Joins in large relational databases (almost) for free. In: International Conference on Data Engineering. [S.l.]: IEEE, 2010. p. 4{15. 2, 3 [3] ZEZULA, P. Future trends in similarity searching. In: International Conference on Similarity Search and Applications. [S.l.]: Springer, 2012. p. 8{24. 2 [4] ALY, A. M.; AREF, W. G.; OUZZANI, M. Spatial Queries with kNN and Relational Predicates. In: International Conference on Advances in Geographic In- formation Systems. [S.l.]: ACM, 2015. p. 28:1{28:10. 2 [5] WU, D.; CONG, G.; JENSEN, C. S. A framework for ecient spatial web object retrieval. The VLDB Journal, Springer, v. 21, n. 6, p. 797{822, 2012. 2 [6] CIACCIA, P.; PATELLA, M.; ZEZULA, P. M-tree: An ecient access method for similarity search in metric spaces. In: International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 1997. p. 426{435. 2, 4 [7] SKOPAL, T.; POKORNY, J.; SNA SEL, V. Nearest neighbours search using the PM-tree. In: International Conference on Database Systems for Advanced Applications. [S.l.]: Springer, 2005. p. 803{815. 2 [8] TRAINA JR., C. et al. The Omni-family of all-purpose access methods: A simple and eective way to make similarity search more ecient. The VLDB Journal, Springer, v. 16, n. 4, p. 483{505, 2007. 2, 3, 4 [9] CHAVEZ, E. et al. Searching in metric spaces. ACM Computing Surveys, ACM, v. 33, n. 3, p. 273{321, 2001. 2, 3 [10] MOVSKO, J.; JAKUB, L.; SKOPAL, T. Clustered Pivot Tables for I/Ooptimized Similarity Search. In: International Conference on Similarity Search and Applications. [S.l.]: ACM, 2011. p. 17{24. 2, 3 [11] GRAEFE, G. Modern B-Tree Techniques. Foundations and Trends in Databases, Now Publishers, Inc., v. 3, n. 4, p. 203{402, 2011. 2 [12] GUTTMAN, A. R-trees: A Dynamic Index Structure for Spatial Searching. ACM Special Interest Group on Management of Data. 2 [13] BEDO, M. V. N. et al. Beyond Hit-or-Miss: A Comparative Study of Synopses for Similarity Searching. Journal of Information and Data Management, SBC, v. 9, n. 1, p. 36{51, 2018. 2, 5 [14] TASAN, M.; OZSOYOGLU, Z. M. Improvements in distance-based indexing. In: Scientic and Statistical Database Management Conference. [S.l.]: IEEE, 2004. p. 161{170. 2, 3, 4, 5, 6 [15] VIEIRA, M. R. et al. Boosting knn queries by estimating suitable query radii. In: Scientic and Statistical Database Management Conference. [S.l.]: IEEE, 2007. p. 10. 2, 4, 6 [16] BUSTOS, C. et al. An empirical evaluation of intrinsic dimension estimators. In: International Conference on Similarity Search and Applications. [S.l.]: Springer, 2015. p. 125{137. 2 [17] PESTOV, V. Indexability, Concentration, and VC Theory. In: International Conference on Similarity Search and Applications. [S.l.]: ACM, 2010. p. 3{12. 2 [18] HETLAND, M. L. The basic principles of metric indexing. In: Swarm Intel- ligence for Multi-objective Problems in Data Mining. [S.l.]: Springer, 2009. p. 199{232. 3 [19] ZEZULA, P. et al. Similarity Search: The Metric Space Approach. [S.l.]: Springer, 2010. v. 1. 3, 5 [20] BoHM, C.; KREBS, F. The k-Nearest Neighbour Join: Turbo Charging the KDD Process. KIS, Springer, v. 6, n. 6, p. 728{749, 2004. 3 [21] HJALTASON, G. R.; SAMET, H. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, ACM, v. 28, n. 4, p. 517{580, 2003. 4, 5 [22] IOANNIDIS, Y. The history of histograms. In: International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 2003. p. 19{30. 5 [23] KONIG, A. C.; WEIKUM, G. A framework for the physical design problem for data synopses. In: International Conference on Extending Database Technology. [S.l.: s.n.], 2002. p. 627{645. 5
Como citar:
Bedo, Marcos V. N.; Oliveira, Rodolfo A. de; Kaster, Daniel S.; Oliveira, Daniel C. M. de; Traina Jr., Caetano; "OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH", p-1890-1905.
In: Anais do XIX Simpósio de Pesquisa Operacional & Logística da Marinha.
São Paulo: Blucher,
2020.
ISSN 21756295,
DOI 10.5151/spolm2019-136
últimos 30 dias
77
downloads
195
visualizações
620
indexações
Sou autor desse trabalho
Você é citado neste trabalho?
Exportar citação - RefWork (RIS)
Copie a citação abaixo ou clique no botão Download para obter um arquivo com os dados
TY - CONF T1 - OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH JO - Simpósio de Pesquisa Operacional e Logística da Marinha - Publicação Online VL - 3 IS - 1 SP - 1890 EP - 1905 PY - 2020 T2 - XIX Simpósio de Pesquisa Operacional & Logística da Marinha AU - , , , , SN - 21756295 DO - http://dx.doi.org/10.5151/spolm2019-136 UR - www.proceedings.blucher.com.br/article-details/omni-histograms-a-novel-partitioning-approach-for-similarity-search-34551 KW - ER -
Exportar citação - BibTeX(BIB)
Copie a citação abaixo ou clique no botão Download para obter um arquivo com os dados
@article{Bedo20144,
title="OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH",
journal="Simpósio de Pesquisa Operacional e Logística da Marinha - Publicação Online",
volume="3",
number="1",
pages="1890 - 1905",
year="2020",
note="",
issn="21756295",
doi="http://dx.doi.org/10.5151/spolm2019-136",
url="www.proceedings.blucher.com.br/article-details/omni-histograms-a-novel-partitioning-approach-for-similarity-search-34551",
author="Marcos V. N. Bedo", "Rodolfo A. de Oliveira", "Daniel S. Kaster", "Daniel C. M. de Oliveira", "Caetano Traina Jr.",
keywords="",
}
Exportar citação - Text(TXT)
Copie a citação abaixo ou clique no botão Download para obter um arquivo com os dados
Marcos V. N. Bedo, Rodolfo A. de Oliveira, Daniel S. Kaster, Daniel C. M. de Oliveira, Caetano Traina Jr., OMNI-HISTOGRAMS: A NOVEL PARTITIONING APPROACH FOR SIMILARITY SEARCH, Simpósio de Pesquisa Operacional e Logística da Marinha - Publicação Online, Volume 3, 2020, Pages 1890-1905, ISSN 21756295, http://dx.doi.org/10.5151/spolm2019-136 (www.proceedings.blucher.com.br/article-details/omni-histograms-a-novel-partitioning-approach-for-similarity-search-34551) Palavras-chave:: ;