Phương pháp cải tiến LSTM dựa trên đặc trưng thống kê trong phát hiện DGA botnet

Van Van Tong, Hieu Dinh Mac, Tung Trong Bui, Duc Quang Tran, Giang Linh Nguyen

Abstract


Recently, botnets have been the main mean for phishing, spamming, and launching Distributed Denial of Service attacks. Most bots today use Domain Generation Algorithms (DGA) (also known as domain fluxing) to construct a resilient Command and Control (C&C) infrastructure. Reverse Engineering has become the prominent approach to combat botnets. It however needs a malware sample that is not always possible in practice. This paper presents an extended version of the Long Short-Term Memory network, where the original algorithm is coupled with other statistical features, namely meaningful character ratio, entropy and length of the domain names to further improve its generalization capability.  Experiments are carried out on a real-world collected dataset that contains 1 non-DGA and 37 DGA malware families. They demonstrated that the new method is able to work on both binary and multi-class tasks. It also produces at least 5% macro-averaging F1-score improvement as compared to other state-of-the-art detection techniques while helping to recognize 3 additional DGA families.

Keywords


DGA botnet, NXDomain, Recurrent neural network, Long short-term memory network

References


V. TONG, G. NGUYEN, Q.D. TRAN, “Phân loại tên miền sử dụng các đặc trưng ngữ nghĩa trong phát hiện DGA Botnet”, Journal on information and communicatons technology, vol. 11, pp. 57-62, 11, 2016 (ISSN: 1859-3550).

N. DAVUTH, and S.R. KIM, "Classification of malicious domain names using support vector machine and bi-gram method", International Journal of Security and Its Applications, vol. 7, no. 1, pp. 51-58. 2013.

J. KWON, and et al., "PsyBoG: A scalable botnet detection method for large-scale DNS traffic", Computer Networks, vol. 97, pp. 48-73, 2016.

M. GRILL, "Detecting DGA malware using NetFlow", IEEE International Symposium on Integrated Network Management (IM), 2015.

M. MOWBRAY, and J. HAGEN, "Finding domain-generation algorithms by looking at length distribution." IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), 2014.

T.S. WANG, C.S. LIN, and H.T. LIN. "DGA Botnet Detection Utilizing Social Network Analysis", In IEEE international symposium on Computer, Consumer and Control (IS3C), 2016.

J. WOODBRIDGE, H.S. ANDERSON, A. AHUJA, and D. GRANT, "Predicting Domain Generation Algorithms with Long Short-Term Memory Networks", arXiv preprint arXiv:1611.00791 (2016).

H. ZHANG, and et al., "BotDigger: Detecting DGA Bots in a Single Network", Computer Science Technical Report, 2016.

T.D NGUYEN, T.D. CAO, and L.G. NGUYEN, "DGA botnet detection using collaborative filtering and density-based clustering", In Proceedings of the Sixth International Symposium on Information and Communication Technology, 2015.

S. SCHIAVONI, M. FEDERICO, L. CAVALLARO, and S. ZANERO. "Phoenix: DGA-based botnet tracking and intelligence", In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 192-211, 2014.

V. TONG, and G. NGUYEN, "A method for detecting DGA botnet based on semantic and cluster analysis", In Proceedings of the Seventh Symposium on Information and Communication Technology, 2016.

OSINT DGA FEED, http://osint.bambenekconsulting. com/feeds.

A.J. ROBINSON, "An application of recurrent nets to phone probability estimation", IEEE transactions on Neural Networks, vol. 5, no. 2, 1994.

T. MIKOLOV, "Recurrent neural network based language model", Interspeech. Vol. 2, 2010.

J. HAN, and C. MORAGA. "The influence of the sigmoid function parameters on the speed of backpropagation learning", From Natural to Artificial Neural Computation, vol. 930, pp. 195-201, 1995.

G.F. BECKER, “Hyperbolic functions”, Read Books, 1931.

A. RENYI, "On measures of entropy and information", In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, vol. 1, 1961.

P.F. BROWN, "Class-based n-gram models of natural language", Journal Computational Linguistics. vol. 18, 1992, pp. 467-479.

L. MAATEN, and G. HINTON, "Visualizing data using t-SNE", Journal of Machine Learning Research, vol. 9. pp. 2579-2605, 2008.

ALEXA, http://www.alexa.com.

D.F. XIA, S.L. XU, and F. QI, "A proof of the arithmetic mean-geometric mean-harmonic mean inequalities", Research Report Collection (RGMIA), vol. 2, no. 1, 1999.

M. ANTONAKAKIS, R. PERDISCI, Y. NADJI, N. VASILOGLOU, S. ABU-NIMEH, W. LEE, and D. DAGON. "From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware", In USENIX security symposium, vol. 12, 2012.

S. HOCHREITER, and J. SCHMIDHUBER, “Long short-term memory”, Neural computation vol. 9, no. 8, pp. 1735-1780, 1997.

F.A. GERS, J. SCHMIDHUBER, and F. CUMMINS, “Learning to forget: Continual prediction with LSTM”, Neural computation, vol. 12, no. 10, pp. 2451-2471, 2000.

C.M. BISHOP, “Pattern Recognition and Machine Learning”, Springer, 2006.

Y. FREUND, and L. MASON, "The alternating decision tree learning algorithm", In Proceeding of the Sixteenth International Conference on Machine Learning (ICML'99), pp. 124-133, 1999.




CƠ QUAN CHỦ QUẢN: BỘ THÔNG TIN VÀ TRUYỀN THÔNG (MIC)
Giấp phép số 69/GP-TTĐT cấp ngày 26/12/2014.
Tổng biên tập: Vũ Chí Kiên
Tòa soạn: 110-112, Bà Triệu, Hà Nội; Điện thoại: 04. 37737136; Fax: 04. 37737130; Email: chuyensanbcvt@mic.gov.vn
Ghi rõ nguồn “Tạp chí Công nghệ thông tin và truyền thông” khi phát hành lại thông tin từ website này