Effects Of Feature Transformation And Selection On Classification Of Network Traffic Activities
Authors: Lim Weng Ying and Mohd Hanafi Ahmad Hijazi
As new technologies are emerging day by day, network, regardless of the Internet or Intranet within a corporation often plays a crucial role in connecting people from all around the world. From military use to achieving business goals and household need, data security often get attention from computer scientists. Traditional security measures that include the installation of firewall and antivirus software are commonly utilised to prevent intrusion. However, such types of defence are merely sufficient to secure a network and data travelling across it. Thus, second lines of defence like Intrusion Detection System (IDS) and Intrusion Prevention System (IPS) are introduced to overcome the inadequacy of traditional security measures. Generally, IDS uses two approaches, the Anomaly Detection (A-IDS) and the Misuse Detection in order to identify patterns of intrusion. A-IDS often perform comparison of the model of normal and anomalous model. Depending on the ability to measure similarity or distance between a target and a known type, comparison is made to determine whether to establish a new target anomalous or not. This research aims to investigate the effects of feature transformation on the classification of network activities; the focus is to represent the data into point series form to permit the application of Time Series Classification (TSC). The TSC technique used is k-Nearest Neighbour (KNN) coupled with Dynamic Time Warping. Effects of using different similarity measures, Euclidean Distance (ED) and Cosine similarity algorithm are also investigated. Experiments conducted involve conversion of the categorical data by three different conversion techniques to generate point series data – simple, probability and entropy conversion. Comparison between different classifiers is also conducted. The performance of the classifier is best using 1NN with Euclidean distance and entropy conversion for categorical data, where the recorded accuracy is 99.19%.