Handling Imbalanced Data in Intrusion Detection Systems using Generative Adversarial Networks

  • Ly Vu Le Quy Don Technical University
  • Quang Uy Nguyen Le Quy Don Technical University
Keywords: Generative adversarial networks, Intrusion detection system, Synthesized Attack, Imbalanced dataset, Sampling technique

Abstract

Machine learning-based intrusion detection has
become more popular in the research community thanks to its
capability in discovering unknown attacks. To develop a good
detection model for an intrusion detection system (IDS) using
machine learning, a great number of attack and normal data
samples are required in the learning process. While normal
data can be relatively easy to collect, attack data is much
rarer and harder to gather. Subsequently, IDS datasets are
often dominated by normal data and machine learning models
trained on those imbalanced datasets are ineffective in detect-
ing attacks. In this paper, we propose a novel solution to this
problem by using generative adversarial networks to generate
synthesized attack data for IDS. The synthesized attacks are
merged with the original data to form the augmented dataset.
Three popular machine learning techniques are trained on the
augmented dataset. The experiments conducted on the three
common IDS datasets and one our own dataset show that
machine learning algorithms achieve better performance when
trained on the augmented dataset of the generative adversarial
networks compared to those trained on the original dataset
and other sampling techniques. The visualization technique
was also used to analyze the properties of the synthesized
data of the generative adversarial networks and the others.

Published
2020-09-30
Section
Regular Articles