Feature Extraction for Machine Learning-based Intrusion Detection in IoT Networks. (arXiv:2108.12722v3 [cs.NI] UPDATED)

A large number of network security breaches in IoT networks have demonstrated
the unreliability of current Network Intrusion Detection Systems (NIDSs).
Consequently, network interruptions and loss of sensitive data have occurred,
which led to an active research area for improving NIDS technologies. In an
analysis of related works, it was observed that most researchers aim to obtain
better classification results by using a set of untried combinations of Feature
Reduction (FR) and Machine Learning (ML) techniques on NIDS datasets. However,
these datasets are different in feature sets, attack types, and network design.
Therefore, this paper aims to discover whether these techniques can be
generalised across various datasets. Six ML models are utilised: a Deep Feed
Forward (DFF), Convolutional Neural Network (CNN), Recurrent Neural Network
(RNN), Decision Tree (DT), Logistic Regression (LR), and Naive Bayes (NB). The
accuracy of three Feature Extraction (FE) algorithms; Principal Component
Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA), are
evaluated using three benchmark datasets: UNSW-NB15, ToN-IoT and
CSE-CIC-IDS2018. Although PCA and AE algorithms have been widely used, the
determination of their optimal number of extracted dimensions has been
overlooked. The results indicate that no clear FE method or ML model can
achieve the best scores for all datasets. The optimal number of extracted
dimensions has been identified for each dataset, and LDA degrades the
performance of the ML models on two datasets. The variance is used to analyse
the extracted dimensions of LDA and PCA. Finally, this paper concludes that the
choice of datasets significantly alters the performance of the applied
techniques. We believe that a universal (benchmark) feature set is needed to
facilitate further advancement and progress of research in this field.