A Human Retrieval System based on Human Attribute Ontology and Deep Multi-task Neural Network

  • Trang Phung HCMUS, SGU
  • Ngoc Ly Q. Faculty of Information Technology, HCM University of Science –VNUHCM, Vietnam.
  • Masayuki Fukuzawa
Keywords: Image Retrieval, Object Retrieval, Human Attribute Ontology, Attribute Learning, Deep Learning, Deep feature learning.


The goal of this research is to enhance the capability of image retrieval systems to understand images more effectively. We present a model designed for searching human objects (such as pedestrians or persons) within expansive image datasets. Our unique approach involves developing an image retrieval system that incorporates attribute learning and the Human Attribute Ontology (HAO). This research offers several key contributions: (1) The development of the Human Attribute Ontology (HAO) which serves as a repository for storing prior knowledge about images. Thanks to its hierarchical structure, this ontology facilitates the reuse of prior knowledge, optimizing the subsequent stages of attribute learning and image retrieval; (2) The implementation of a Convolutional Neural Network (CNN) to spearhead attribute learning, leveraging the HAO to enhance accuracy; (3) The creation of a Human Image Retrieval system that utilizes both attribute learning and the HAO. Our system delves deeper by understanding images at the attribute level, highlighting the advantages of harnessing the ontology to reuse existing knowledge. The efficacy of our methodology is validated through experiments on benchmark datasets like PETA and Pa100k achieving state-of-the-art results.


[1] E. Yaghoubi, F. Khezeli, D. Borza, S. Ku- mar, J. Neves, and H. Proenc ̧a, “Human attribute recognition-a comprehensive survey,” Appl. Sci, 2020.
[2] S. Dubey, “A decade survey of content based image retrieval using deep learning,” 2021.
[3] M. Kalayeh, E. Basaran, M. Gokmen, M. Kamasak, and M. Shah, “Human semantic parsing for person re-identification,” Proc. IEEE Comput. Soc. Conf. Com- put. Vis. Pattern Recognit, pp. 1062–1071, 2018.
[4] T. He, X. Shen, J. Huang, Z. Chen, and X.-S. Hua, “Partial person re-identification with part-part corre- spondence learning,” Cvpr, pp. 9105–9115, 2021.
[5] J. Liu, Z.-J. Zha, W. Wu, K. Zheng, and Q. Sun, “Spatial-temporal correlation and topology learning for person re-identification in videos,” CVPR, pp. 4370–4379, 2021.
[6] Z. Zhang, H. Zhang, and S. Liu, “Person re-identification using heterogeneous local graph attenuation networks,” Cvpr, pp. 12 136–12 145, 2021.
[7] B. Nguyen, B. Nguyen, T. Do, E. Tjiputra, Q. Tran, and A. Nguyen, “Graph-based person signature for person re-identifications,” 2021.
[8] D. Parikh and K. Grauman, “Relative attributes learning,” pp. 503–510, 2011.
[9] K. Grauman and B. Leibe, Visual object recognition, 2011.
[10] X. Wang, S. Zheng, R. Yang, B. Luo, and J. Tang, “Pedestrian attribute recognition: A survey,” pp. 1–32, 2019.
[11] J. Joo, S. Wang, and S. Zhu, “Human attribute recognition by rich appearance dictionary,” Proc. IEEE Int. Conf. Comput. Vis, pp. 721–728,, 2013.
[12] Y. Lin, “Improving person re-identification by tribute and identity learning,” Pattern Recognition, vol. 95, pp. 151–161, 2017.
[13] G. Zhang and J. Xu, “Person re-identification by mid-level attribute and part-based identity learning,” Proc. Mach. Learn. Res, vol. 95, pp. 220–231,, 2018.
[14] S. Li, T. Xiao, H. Li, B. Zhou, D. Yue, and X. Wang, “Person search with natural language description,” pp. 5187–5196, 2017.
[15] H. Galiyawala and M. Raval, “Person retrieval in surveillance using textual query: a review,” Multimed. Tools Appl, 2021.
[16] M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. Hoi, “Deep learning for person re-identification: A survey and outlook,” pp. 1–20, 2020.
[17] C. Tay, S. Roy, and K. Yap, “Aanet: Attribute attention network for person re-identifications,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, vol. 2019-June, pp. 7127–7136, 2019.
[18] M. Wieczorek, A. Michalowski, A. Wroblewska, and J. Dabrowski, “A strong baseline for fashion retrieval with person re-identification models,” 2020.
[19] T. Gruber, “Toward principles for the design of ontologies used for knowledge sharing,” Int. J. Hum. - Comput. Stud, vol. 43, no. 5–6, pp. 907–928, 1993.
[20] H. Nguyen, N. Ly, and T. Phung, “Large-scale face image retrieval system at attribute level based on facial attribute ontology and deep neuron network,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 10752, pp. 539–549, 2018.
[21] M. Xu, “Bio-inspired deep attribute learning towards facial aesthetic prediction,” IEEE Trans. Affect. Com- put vol. 3045, no. c, pp. 1–1, 2018.
[22] R. Feris, C. Lampert, and D. Parikh, Visual Attributes. Springer, 2017.
[23] V. Ferrari and A. Zisserman, “Learning visual attributes,” Adv. Neural Inf. Process. Syst, vol. 20, 2008.
[24] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” 2009.
[25] C. Lampert, H. Nickisch, and S. Harmeling, “Attribute-based classification for zero-shot visual ob-
ject categorization,” 2013.
[26] O. Russakovsky and L. Fei-Fei, “Attribute learning
in large-scale datasets,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, no. PART 1, pp. 1–14, 2012.
[27] N. Kumar, “Describable visual attributes for face images,” 2011.
[28] Y. Fu, “Attribute learning for image/video understanding,” Queen, 2015-03.
[29] E. Rudd, M. Gu ̈nther, and T. Boult, “Moon: A mixed objective optimization network for the recognition of facial attributes,” LNCS Bioinforma, pp. 19–35, 2016.
[30] K. He, Z. Wang, Y. Fu, R. Feng, Y. Jiang, and X. Xue, “Adaptively weighted multi-task deep network for person attribute classification,” MM, pp. 1636–1644, 2017.
[31] D. Jayaraman, F. Sha, and K. Grauman, “Decorrelating semantic visual attributes by resisting the urge to share,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, pp. 1629–1636, 2014.
[32] Z. Huo, Y. Xia, and B. Zhang, “Vehicle type classification and attribute prediction using multi-task rcnn,” pp. 564–569, 2016.
[33] Z. Tan, Y. Yang, J. Wan, H. Hang, G. Guo, and S. Li, “Attention-based pedestrian attribute analysis,” IEEE Trans. Image Process, vol. 28, no. 12, pp. 6126–6140, 2019.
[34] Y. Hu, X. Bai, P. Zhou, F. Shang, and S. Shen, “Data augmentation imbalance for imbalanced attribute classification,” 2020.
[35] A. Specker, M. Cormier, and J. Beyerer, “Upar: Unified pedestrian attribute recognition and person retrieval,” Appl. Comput. Vision, WACV, pp. 981–990, 2023.
[36] X. Liu, “Hydraplus-net: Attentive deep features for pedestrian analysis,” Proc. IEEE Int. Conf. Comput. Vis, no. c, pp. 350–359, 2017.
[37] C. Tang, L. Sheng, Z. Zhang, and X. Hu, “Improving pedestrian attribute recognition with weakly- supervised multi-scale attribute-specific localization,” Proc. IEEE Int. Conf. Comput. Vis, no. c, pp. 4996–5005, 2019.
[38] J. Jia, H. Huang, W. Yang, X. Chen, and K. Huang, “Rethinking of pedestrian attribute recognition: Realistic datasets and a strong baseline,” pp. 1–12, 2020.
[39] J. Jia, X. Chen, and K. Huang, “Spatial and semantic consistency regularizations for pedestrian attribute recognition,” Proc. IEEE Int. Conf. Comput. Vis, pp. 942–951, 2021.
[40] M. Moghaddam, M. Charmi, and H. Hassanpoor, “Jointly human semantic parsing and attribute recognition with feature pyramid structure in efficientnets,” IET Image Process, vol. 15, no. 10, pp. 2281–2291, 2021.
[41] J. Jia, N. Gao, F. He, X. Chen, and K. Huang, “Learning disentangled attribute representations for robust pedestrian attribute recognition,” Proc. AAAI Conf. Artif. Intell, vol. 36, no. 1, pp. 1069–1077,, 2022.
[42] Z. Tan, Y. Yang, J. Wan, G. Guo, and S. Li, “Relation-aware pedestrian attribute recognition with graph convolutional networks,” 2020.
[43] Z. Ji, Z. Hu, Y. Wang, Z. Shao, and Y. Pang, “Reinforced pedestrian attribute recognition with group optimization,” SSRN Electron. J, 2022.
[44] Y. Shi, Z. Wei, H. Ling, Z. Wang, J. Shen, and P. Li, “Person retrieval in surveillance videos via deep attribute mining and reasoning,” IEEE Trans. Multimed, vol. 14, no. 8, 2020.
[45] J. Wang, X. Zhu, S. Gong, and W. Li, “Transferable joint attribute-identity deep learning for unsupervised person re-identification,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, no. 1, pp. 2275–2284, 2018.
[46] Y. Men, Y. Mao, Y. Jiang, W. Ma, and Z. Lian, “Controllable person image synthesis with attribute- decomposed gan,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, pp. 5083–5092, 2020.
[47] Y. Chen, S. Duffner, A. Stoian, J. Dufour, and A. Baskurt, “Pedestrian attribute recognition with part-based cnn and combined feature representations,” Jt. Conf. Comput. Vision, Imaging Comput. Graph. Theory Appl, vol. 5, pp. 114–122, 2018.
[48] X. Song, H. Yang, and C. Zhou, “Pedestrian attribute recognition with graph convolutional network in surveillance scenarios,” Futur. Internet, vol. 11, no. 11, 2019.
[49] Q. Dong, X. Zhu, and S. Gong, “Person search by text attribute query as zero-shot learning,” Proc. IEEE Int. Conf. Comput. Vis, vol. 2019-Octob, pp. 3651–3660, 2019.
[50] N. Sarafianos, C. Nikou, T. Giannakopoulos, and I. Kakadiaris, “Curriculum learning for multi-task classification of visual attributes,” pp. 2608–2615, 2017.
[51] S. Pei, “Multitask model for person re-identification by,” 2021.
[52] N. Ly, T. Do, and B. Nguyen, “Large-scale coarse-to-fine object retrieval ontology and deep local multitask learning,” Comput. Intell. Neurosci, vol. 2019, 2019.
[53] Y. Cao, J. Wang, and D. Tao, “Symbiotic adversarial learning for attribute-based person search,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 12359, pp. 230–247, 2020.
[54] Z. Liu, H. Mao, C. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, vol. 2022-June, pp. 11 966–11 976, 2022.
[55] Z. Wang, Z. Fang, J. Wang, and Y. Yang, “Vitaa: Visual-textual attributes alignment in-person search by natural language,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 12357, pp. 402–420, 2020.
[56] T. Ma, M. Yang, H. Rong, Y. Qian, Y. Tian, and N. Al-Nabhan, “Dual-path cnn with max gated block for text-based person re-identification,” Image Vis. Comput, vol. 111, pp. 1–14, 2021.
[57] S. Wang, R. Wang, Z. Yao, S. Shan, and X. Chen, “Cross-modal scene graph matching for relationship- aware image-text retrieval,” Appl. Comput. Vision, WACV, pp. 1497–1506, 2020.
[58] B. Jeong, J. Park, and S. Kwak, “Asmr: Learning attribute-based person search with adaptive semantic margin regularizer,” Proc. IEEE Int. Conf. Comput. Vis, vol. 1, no. 1, pp. 11 996–12 005,, 2021.
[59] N. Maillot, M. Thonnat, and C. Hudelot, “Ontology-based object learning and recognition: Application to image retrieval,” Proc. - Int. Conf. Tools with Artif. Intell. ICTAI, no. 1, pp. 620–625,, 2004.
[60] N. Maillot, “Ontology-based object learning and recognition-thesis,” 2008.
[61] V. Mezaris, I. Kompatsiaris, and M. Strintzis, “An ontology approach to object-based image retrieval,” pp. –511–14, 2004.
[62] C. Hudelot, “Towards a cognitive vision platform for semantic image interpretation; application to the recognition of biological organisms,” 2008.
[63] R. Contreras, O. Starostenko, V. Alarcon-Aquino, and L. Flores-Pulido, “Facial feature model for emotion recognition using fuzzy reasoning,” Lect. Notes Com- put. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics, vol. 6256, pp. 11–21, 2010.
[64] R. Bashar, S. Kang, P. Dawadi, and P. Rhee, “A context-aware statistical ontology approach for adaptive face recognition,” pp. 704–709, 2007.
[65] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2019.
[66] Y. Deng, P. Luo, C. Loy, and X. Tang, “Pedestrian attribute recognition at far distance,” Proc. ACM Int. Conf. Multimed, vol. MM’14, pp. 789–792,2014.
[67] D. Li, X. Chen, and K. Huang, “Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios,” pp. 111–115, 2015.