Deep Learning of Image Representations with Convolutional Neural Networks Autoencoder for Image Retrieval with Relevance Feedback

  • Quynh Dao Thi Thuy Posts and Telecommunications Institute of Technology
Keywords: image representations, CNNs, autoencoder, relevant feedback


Image retrieval with traditional relevance feedback encounters problems: (1) ability to represent handcrafted features which is limited, and (2) inefficient with high-dimensional data such as image data. In this paper, we propose a framework based on very deep convolutional neural network autoencoder for image retrieval, called AIR (Autoencoders for Image Retrieval). Our proposed framework allows to learn feature vectors directly from the raw image and in an unsupervised manner. In addition, our framework utilizes a hybrid approach of unsupervised and supervised to improve retrieval performance. The experimental results show that our method gives better results than some existing methods on the CIFAR-100 image set, which consists of 60,000


T. Deselaers, D. Keysers, and H. Ney, “Features for image retrieval: an experimental comparison,” Information retrieval, vol. 11, no. 2, pp. 77–107, 2008.

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.

T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971–987, 2002.

S. R. Dubey, S. K. Singh, and R. K. Singh, “Rotation and illumination invariant interleaved intensity order-based local descriptor,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5323–5333, 2014.

I. J. Jacob, K. Srinivasagan, and K. Jayapriya, “Local oppugnant color texture pattern for image retrieval system,” Pattern Recognition Letters, vol. 42, pp. 72–78, 2014.

D. Song and D. Tao, “Biologically inspired feature manifold for scene classification,” IEEE Transactions on Image Processing, vol. 19, no. 1, pp. 174–184, 2009.

G. Sumbul, J. Kang, and B. Demir, “Deep learning for image search and retrieval in large remote sensing archives,” Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences, pp. 150–160, 2021.

A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2014, pp. 806–813.

L. Jing and Y. Tian, “Self-supervised visual feature learning with deep neural networks: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 11, pp. 4037–4058, 2020.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.

C. H. Dagli, Artificial neural networks for intelligent manufacturing. Springer Science & Business Media, 2012.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., vol. 25. Curran Associates, Inc., 2012. [Online]. Available:


K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Gated feedback recurrent neural networks,” in International conference on machine learning. PMLR, 2015, pp. 2067–2075.

J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li, “Deep learning for content-based image retrieval: A comprehensive study,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 157–166.

L. Zheng, Y. Yang, and Q. Tian, “Sift meets cnn: A decade survey of instance retrieval,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 5, pp. 1224– 1244, 2017.

A. Krizhevsky and G. E. Hinton, “Using very deep autoencoders for content-based image retrieval.” in ESANN, vol. 1. Citeseer, 2011, p. 2.

J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment,” in European conference on computer vision. Springer, 2014, pp. 1–16.

S. R. Dubey, “A decade survey of content based image retrieval using deep learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 2687– 2704, 2021.

M. Chen, P. Zhou, and G. Fortino, “Emotion communication system,” IEEE Access, vol. 5, pp. 326–337, 2016.

Y. Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1891–1898.

C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1915–1929, 2012.

H. Su, Z. Yin, S. Huh, T. Kanade, and J. Zhu, “Interactive cell segmentation based on active and semi- upervised learning,” IEEE transactions on medical imaging, vol. 35, no. 3, pp. 762–777, 2015.

K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Medical image analysis, vol. 36, pp. 61–78, 2017.

A. Patel, S. C. van de Leemput, M. Prokop, B. van Ginneken, and R. Manniesing, “Automatic cerebrospinal fluid segmentation in non-contrast ct images using a 3d convolutional network,” in Medical Imaging 2017: Computer-Aided Diagnosis, vol. 10134. SPIE, 2017, pp. 522–527.

G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural computation, vol. 14, no. 8, pp. 1771–1800, 2002.

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” Advances in neural information processing systems, vol. 19, 2006.

P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1096–1103.

D. Kumar, A. Wong, and D. A. Clausi, “Lung nodule classification using deep features in ct images,” in 2015 12th conference on computer and robot vision. IEEE, 2015, pp. 133–138.

M. Kallenberg, K. Petersen, M. Nielsen, A. Y. Ng, P. Diao, C. Igel, C. M. Vachon, K. Holland, R. R. Winkel, N. Karssemeijer et al., “Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1322–1331, 2016.

Q. Li, W. Cai, and D. D. Feng, “Lung image patch classification with automatic feature learning,” in 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2013, pp. 6079–6082.

C. C. Tan, “Autoencoder neural networks: A performance study based on image recognition, reconstruction and compression,” Ph.D. dissertation, Multimedia University, 2008.

H. Xue, L. Xue, and F. Su, “Multimodal music mood classification by fusion of audio and lyrics,” in International Conference on Multimedia Modeling. Springer, 2015, pp. 26–37.

G. Alain and Y. Bengio, “What regularized auto-encoders learn from the data-generating distribution,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3563–3593, 2014.

M. Chen, K. Weinberger, F. Sha, and Y. Bengio, “Marginalized denoising auto-encoders for nonlinear representations,” in International conference on machine learning. PMLR, 2014, pp. 1476–1484.

J. Han, D. Zhang, S. Wen, L. Guo, T. Liu, and X. Li, “Twostage learning to predict human eye fixations via sdaes,” IEEE transactions on cybernetics, vol. 46, no. 2, pp. 487– 498, 2015.

X. Liu, M. Wang, Z.-J. Zha, and R. Hong, “Cross-modality feature learning via convolutional autoencoder,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 15, no. 1s, pp. 1–20, 2019.

D. Tao, X. Tang, X. Li, and Y. Rui, “Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm,” IEEE Transactions on Multimedia, vol. 8, no. 4, pp. 716–727, 2006.

Y. Chen, X. Lu, and X. Li, “Supervised deep hashing with a joint deep network,” Pattern Recognition, vol. 105, p. 107368, 2020.

C. Deng, E. Yang, T. Liu, J. Li, W. Liu, and D. Tao, “Unsupervised semantic-preserving adversarial hashing for

image search,” IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 4032–4044, 2019.

X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems, vol. 29, 2016.

I. A. Siradjuddin, W. A. Wardana, and M. K. Sophan, “Feature extraction using self-supervised convolutional autoencoder for content based image retrieval,” in 2019 3rd International Conference on Informatics and Computational Sciences (ICICoS). IEEE, 2019, pp. 1–5.