Abstract:
The task of searching and recognizing objects in images has become an important research topic in the area of image processing and computer vision. Looking for similar images in large datasets given an input query and responding as fast as possible is a very challenging task. In this work the Bag of Features approach is studied, and an implementation of the visual vocabulary tree method from Nistér and Stewénius is presented. Images are described using local invariant descriptor techniques and then indexed in a database using an inverted index for further queries. The descriptors are quantized according to a visual vocabulary, creating sparse vectors, which allows to compute very efficiently, for each query, a ranking of similarity for indexed images. The performance of the method is analyzed varying different factors, such as the parameters for the vocabulary tree construction, different techniques of local descriptors extraction and dimensionality reduction with PCA. It can be observed that the retrieval performance increases with a richer vocabulary and decays very slowly as the size of the dataset grows. © 2018 IPOL and the authors CC-BY-NC-SA.
Referencias:
- Alcantarilla, P.F., Bartoli, A., Davison, A.J., (2012) Kaze Features. In European Conference on Computer Vision (ECCV), pp. 214-227. , https://doi.org/10.1007/978-3-642-33783-3.16
- Alcantarilla, P.F., Nuevo, J., Bartoli, A., Fast explicit diffusion for accelerated features in nonlinear scale spaces (2011) IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (7), pp. 1281-1298. , https://doi.Org/10.5244/c.27.13
- Bay, H., Tuytelaars, T., Van Gool, L., SURF: Speeded up robust features (2006) In European Conference on Computer Vision (ECCV), pp. 404-417. , https://doi.org/10.1007/11744023.32
- Calonder, M., Lepetit, V., Strecha, C., Fua, P., Brief: Binary robust independent elementary features (2010) In European Conference on Computer Vision (ECCV), pp. 778-792. , https://doi.org/10.1007/978-3-642-15561-1_56
- Fischler, M.A., Bolles, R.C., Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography (1981) Communications of the ACM, 24 (6), pp. 381-395. , https://doi.org/10.1016/b978-0-08-051581-6.50070-2
- Funayama, R., Yanagihara, H., Van Gool, L., Tuytelaars, T., Bay, H., Robust interest point detector and descriptor (2009) US Patent App, , https://www.google.com/patents/
- Grana, C., Borghesani, D., Manfredi, M., Cucchiara, R., (2013) A fast approach for integrating orb descriptors in the bag of words model, , https://doi.org/10.1117/12.2008460, IS&T/SPIE Electronic Imaging, pages 866709-866709. International Society for Optics and Photonics
- Harris, C., Stephens, M., A combined corner and edge detector (1988) Alvey Vision Conference, 15, pp. 10-5244. , https://doi.org/10.5244/c-2.23, Manchester, UK
- Jégou, H., Douze, M., Schmid, C., Hamming embedding and weak geometric consistency for large scale image search (2008) In European Conference on Computer Vision (ECCV), pp. 304-317. , https://doi.org/10.1007/978-3-540-88682-2.24
- Jégou, H., Douze, M., Schmid, C., Improving bag-of-features for large scale image search (2010) International Journal of Computer Vision, 87 (3), pp. 316-336. , https://doi.org/10.1007/si1263-009-0285-2
- Jégou, H., Douze, M., Schmid, C., Pérez, P., Aggregating local descriptors into a compact image representation (2010) Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304-3311. , https://doi.org/10.1109/cvpr.2010.5540039
- Jégou, H., Harzallah, H., Schmid, C., A contextual dissimilarity measure for accurate and efficient image search (2007) Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-8. , https://doi.org/10.1109/cvpr.2007.382970
- Lepetit, V., Lagger, P., Fua, P., Randomized trees for real-time keypoint recognition (2005) Conference on Computer Vision and Pattern Recognition (CVPR), 2, pp. 775-781. , https://doi.org/10.1109/CVPR.2005.288
- Leutenegger, S., Chli, M., Siegwart, R.Y., Brisk: Binary robust invariant scalable keypoints (2011) In International Conference on Computer Vision (ICCV), pp. 2548-2555. , https://doi.org/10.1109/iccv.2011.6126542
- Lowe, D.G., Distinctive image features from scale-invariant key points (2004) International Journal of Computer Vision, 60 (2), pp. 91-110. , https://doi.Org/10.1023/b:visi.0000029664.99615.94
- Lowe, D., (2004) Method and Apparatus for Identifying Scale Invariant Features in an Image and Use of Same for Locating an Object in an Image, , https://www.google.com/patents/US6711293, US Patent 6,711,293
- Muja, M., Lowe, D.G., Fast approximate nearest neighbors with automatic algorithm configuration (2009) International Conference on Computer Vision Theory and Applications (VISAPP), 2
- Nistér, D., Stewénius, H., Scalable recognition with a vocabulary tree (2006) Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2161-2168. , https://doi.org/10.1109/CVPR.2006.264
- Oliva, A., Torralba, A., (2001) Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, 42 (3). , https://doi.org/10.1023/A:1011139631724, 145 175
- Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A., Object retrieval with large vocabularies and fast spatial matching (2007) Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-8. , https://doi.org/10.1109/cvpr.2007.383172, IEEE
- Rublee, E., Rabaud, V., Konolige, K., Bradski, G., ORB: An efficient alternative to SIFT or SURF (2011) In International Conference on Computer Vision (ICCV), pp. 2564-2571. , https://doi.org/10.1109/iccv.2011.6126544
- Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J., Image classification with the Fisher vector: Theory and practice (2013) International Journal of Computer Vision, 105 (3), pp. 222-245. , https://doi.org/10.1007/s11263-013-0636-x
- Sivic, J., Zisserman, A., Video Google: A text retrieval approach to object matching in videos (2003) International Conference on Computer Vision (ICCV), pp. 1470-1477. , https://doi.org/10.1109/iccv.2003.1238663
- Torralba, A., Fergus, R., Weiss, Y., Small codes and large image databases for recognition (2008) In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-8. , https://doi.org/10.1109/cvpr.2008.4587633
- Weiss, Y., Torralba, A., Fergus, R., Spectral hashing (2009) Advances in Neural Information Processing Systems 21, pp. 1753-1760. , In D. Roller, D. Schuurmans. Y. Bengio, and L. Bottou, editors, Curran Associates, Inc
- Zhang, S., Huang, Q., Hua, G., Jiang, S., Gao, W., Tian, Q., Building contextual visual vocabulary for large-scale image applications (2010) ACM International Conference on Multimedia, pp. 501-510. , https://doi.org/10.1145/1873951.1874018
- Zobel, J., Moffat, A., (2006) Inverted Files for Text Search Engines.,1 CM Computing Surveys (CSUR), 38 (2), p. 6. , https://doi.org/10.1145/1132956.U32959
Citas:
---------- APA ----------
Uriza, E., Gómez-Fernández, F. & Rais, M.
(2018)
. Efficient large-scale image search with a vocabulary tree. Image Processing On Line, 8, 71-98.
http://dx.doi.org/10.5201/ipol.2018.199---------- CHICAGO ----------
Uriza, E., Gómez-Fernández, F., Rais, M.
"Efficient large-scale image search with a vocabulary tree"
. Image Processing On Line 8
(2018) : 71-98.
http://dx.doi.org/10.5201/ipol.2018.199---------- MLA ----------
Uriza, E., Gómez-Fernández, F., Rais, M.
"Efficient large-scale image search with a vocabulary tree"
. Image Processing On Line, vol. 8, 2018, pp. 71-98.
http://dx.doi.org/10.5201/ipol.2018.199---------- VANCOUVER ----------
Uriza, E., Gómez-Fernández, F., Rais, M. Efficient large-scale image search with a vocabulary tree. Image Process. On line. 2018;8:71-98.
http://dx.doi.org/10.5201/ipol.2018.199