Feature Analysis for Machine Learning-based IoT Intrusion Detection. (arXiv:2108.12732v2 [cs.CR] UPDATED)

Internet of Things (IoT) networks have become an increasingly attractive
target of cyberattacks. Powerful Machine Learning (ML) models have recently
been adopted to implement network intrusion detection systems to protect IoT
networks. For the successful training of such ML models, selecting the right
data features is crucial, maximising the detection accuracy and computational
efficiency. This paper comprehensively analyses feature sets’ importance and
predictive power for detecting network attacks. Three feature selection
algorithms: chi-square, information gain and correlation, have been utilised to
identify and rank data features. The attributes are fed into two ML
classifiers: deep feed-forward and random forest, to measure their attack
detection performance. The experimental evaluation considered three datasets:
UNSW-NB15, CSE-CIC-IDS2018, and ToN-IoT in their proprietary flow format. In
addition, the respective variants in NetFlow format were also considered, i.e.,
NF-UNSW-NB15, NF-CSE-CIC-IDS2018, and NF-ToN-IoT. The experimental evaluation
explored the marginal benefit of adding individual features. Our results show
that the accuracy initially increases rapidly with adding features but
converges quickly to the maximum. This demonstrates a significant potential to
reduce the computational and storage cost of intrusion detection systems while
maintaining near-optimal detection accuracy. This has particular relevance in
IoT systems, with typically limited computational and storage resources.