Omni-Dimensional Adaptation for MobileNetV3 using Bayesian Hyperparameter Tuning

Sử dụng tối ưu Bayes để tối ưu siêu tham số cho mô hình Omni-Dimensional MobiletNetV3

  • Long
  • Nhat Quang Phan
  • Van Dat Tran
  • Duc-Long Dang
Keywords: Omni-dimensional convolution, MobileNetV3, Bayesian optimization

Abstract

This paper proposes enhancing MobileNetV3 with Omni-Dimensional Dynamic Convolution (OD-Conv) to
overcome CNNs’ limitation of static convolution kernels. OD-Conv introduces multi-dimensional attention to adjust convolution kernels across spatial, input channel, output channel, and number of kernels dimensions, improving feature representation. Bayesian Optimization optimizes hyperparameters efficiently. Experiments show Omni-MobileNetV3 outperforms MobileNetV3 on CIFAR-100, Tiny ImageNet, and medical image datasets, achieving up to 3% accuracy gain while maintaining efficiency. This dynamic convolution method combined with Bayesian tuning achieves state-of-the-art results in image classification

Author Biographies

Nhat Quang Phan

Phan Nhat Quang is a third year student, studying Data Science major, at VNUK Institute for Research and Executive
Education - the University of Danang,
Danang. His current research interest is
Computer Vision, Natural Language Process, and Foundation Models.
Email: quang.phan210405@vnuk.edu.vn

Van Dat Tran

Tran Van Dat is a second year student,
studying Data Science major, at VN-UK
Institute for Research and Executive Education - the University of Danang, Danang.
His current research interest is Computer
Vision, Natural Language Process, and
Foundation Models.
Email: dat.tran220407@vnuk.edu.vn

Duc-Long Dang

Long Vo, Nhat-Quang Phan, Van-Dat Tran∗, and Duc-Long Dang∗
VN-UK Institute for Research and Executive Education, the University of Danang -
Danang 550000, Vietnam
*long.dang@vnuk.edu.vn

References

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.

Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, “Dynamic convolution: Attention over convolution kernels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.

X. Jia, B. De Brabandere, T. Tuytelaars, and L. V. Gool, “Dynamic filter networks,” Advances in neural information processing systems, vol. 29, 2016.

A. Diba, V. Sharma, L. V. Gool, and R. Stiefelhagen, “Dynamonet: Dynamic action and motion network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.

P. I. Frazier, “A tutorial on bayesian optimization,” arXiv preprint arXiv:1807.02811, 2018.

J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” Advances in neural information processing systems, vol. 25, 2012.

F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size,” CoRR, vol. abs/1602.07360, 2016. [Online]. Available: http://arxiv.org/abs/1602.07360

X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.

C. Li, A. Zhou, and A. Yao, “Omni-dimensional dynamic convolution,” arXiv preprint arXiv:2209.07947, 2022.

T. Yu and H. Zhu, “Hyper-parameter optimization: A review of algorithms and applications,” 2020.

H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural architecture search via parameters sharing,” in International conference on machine learning. PMLR, 2018, pp. 4095–4104.

B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710.

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.

Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.

S. Bhattarai, “New plant diseases dataset,” https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset/data.

P. Patel, “Chest x-ray (covid-19 & pneumonia),” https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia.

H. Zhang, M. Cisse, Y. N. Dauphin, and D. LopezPaz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.

Published
2024-06-03