Project summary

Machine Learning (ML) based-Network Intrusion Detection Systems (NIDSs) are required to be trained on datasets. This research project has generated NIDS datasets for ML model training and evaluation. The datasets are presented in common NetFlow and CICFlowMeter formats to allow for effective ML-based NIDSs performance comparisons.


NetFlow V1 Datasets

Version 1 of the datasets are made up of 8 basic NetFlow features explained here.

NF-UNSW-NB15                           NF-ToN-IoT                            NF-BoT-IoT

NF-CSE-CIC-IDS2018                  NF-UQ-NIDS

Please click here to download the datasets in CSV format. The details of the datasets are published in:

Sarhan M., Layeghy S., Moustafa N., Portmann M. (2021) NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems. In: Big Data Technologies and Applications. BDTA 2020, WiCON 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-72802-1_9


NetFlow V2 Datasets

Version 2 of the datasets are made up of 43 extended NetFlow features explained here.

NF-UNSW-NB15-v2                    NF-ToN-IoT-v2                       NF-BoT-IoT-v2

NF-CSE-CIC-IDS2018-v2          NF-UQ-NIDS-v2

Please click here to download the datasets in CSV format. The details of the datasets are published in:

Mohanad Sarhan, Siamak Layeghy, and Marius Portmann, Towards a Standard Feature Set for Network Intrusion Detection System Datasets, Mobile Networks and Applications, 103, 108379, 2022. https://doi.org/10.1007/s11036-021-01843-0


CICFlowMeter Datasets

CICFlowMeter format of the datasets are made up of 83 network features explained here.

     CIC-ToN-IoT                            CIC-BoT-IoT

Please click here to download the datasets in CSV format. The details of the datasets are published in:

Mohanad Sarhan, Siamak Layeghy, and Marius Portmann, Evaluating Standard Feature Sets Towards Increased Generalisability and Explainability of ML-based Network Intrusion Detection, Big Data Research, 30, 100359, 2022 https://doi.org/10.1016/j.bdr.2022.100359


License

The use of the datasets for academic research purpuses is granted in perpetuity after citing the above papers. For commerical purposes it should be agreed by the authours.

Please get in touch with the authour Mohanad Sarhan for more details.


1. NF-UNSW-NB15

The NetFlow-based format of the UNSW-NB15 dataset, named NF-UNSW-NB15, has been developed and labelled with its respective attack categories. The total number of data flows is 1,623,118 out of which 72,406 (4.46%) are attack samples and 1,550,712 (95.54%) are benign. The attack samples are further classified into nine subcategories, The table below represents the NF-UNSW-NB15 dataset's distribution of all flows.

Please click here to download the dataset.

ClassCountDescription
Benign1550712Normal unmalicious flows
Fuzzers19463An attack in which the attacker sends large amounts of random data which cause a system to crash and also aim to discover security vulnerabilities in a system.
Analysis1995A group that presents a variety of threats that target web applications through ports, emails and scripts.
Backdoor1782A technique that aims to bypass security mechanisms by replying to specific constructed client applications.
DoS5051Denial of Service is an attempt to overload a computer system's resources with the aimof preventing access to or availability of its data.
Exploits24736Are sequences of commands controlling the behaviour of a host through a known vulnerability
Generic5570A method that targets cryptography and causes a collision with each block-cipher.
Reconnaissance12291A technique for gathering information about a network host and is also known as a probe.
Shellcode1365A malware that penetrates a code to control a victim's host.
Worms153Attacks that replicate themselves and spread to other computers.

2. NF-ToN-IoT

We utilised the publicly available pcaps of the ToN-IoT dataset to generate its NetFlow records, leading to a NetFlow-based IoT network dataset called NF-ToN-IoT. The total number of data flows is 1,379,274 out of which 1,108,995 (80.4%) are attack samples and 270,279 (19.6%) are benign ones, the table below lists and defines the distribution of the NF-ToN-IoT dataset.

Please click here to download the dataset.

ClassCountDescription
Benign270279Normal unmalicious flows
Backdoor17247A technique that aims to attack remote-access computers by replying to specific constructed client applications.
DoS17717An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS326345An attempt similar to DoS but has multiple different distributed sources.
Injection468539A variety of attacks that supply untrusted inputs that aim to alter the course of execution, with SQL and Code injections two of the main ones.
MITM1295Man In The Middle is a method that places an attacker between a victim and host with which the victim is trying to communicate, with the aim of intercepting traffic and communications.
Password156299covers a variety of attacks aimed at retrieving passwords by either brute force or sniffing.
Ransomware142An attack that encrypts the files stored on a host and asks for compensation in exchange for the decryption technique/key.
Scanning21467A group that consists of a variety of techniques that aim to discover information about networks and hosts, and is also known as probing.
XSS99944Cross-site Scripting is a type of injection in which an attacker uses web applications to send malicious scripts to end-users.

3. NF-BoT-IoT

An IoT NetFlow-based dataset generated using the BoT-IoT dataset, named NF-BoT-IoT. The features were extracted from the publicly available pcap files and the flows were labelled with their respective attack categories. The total number of data flows is 600,100 out of which 586,241 (97.69%) are attack samples and 13,859 (2.31%) are benign. There are four attack categories in the dataset, the table below represents the NF-BoT-IoT distribution of all flows.

Please click here to download the dataset.

ClassCountDescription
Benign13859Normal unmalicious flows
Reconnaissance470655A technique for gathering information about a network host and is also known as a probe.
DDoS56844Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS56833An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft1909A group of attacks that aims to obtain sensitive data such as data theft and keylogging

4. NF-CSE-CIC-IDS2018

We utilised the original pcap files of the CSE-CIC-IDS2018 dataset to generate a NetFlow-based dataset called NF-CSE-CIC-IDS2018. The total number of flows is 8,392,401 out of which 1,019,203 (12.14%) are attack samples and 7,373,198 (87.86%) are benign ones, the table below represents the dataset's distribution.

Please click here to download the dataset.

ClassCountDescription
Benign7373198Normal unmalicious flows
BruteForce287597A technique that aims to obtain usernames and password credentials by accessing a list of predefined possibilities
Bot15683An attack that enables an attacker to remotely control several hijacked computers to perform malicious activities.
DoS269361An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS380096An attempt similar to DoS but has multiple different distributed sources.
Infiltration62072An inside attack that sends a malicious file via an email to exploit an application and is followed by a backdoor that scans the network for other vulnerabilities
Web Attacks4394A group that includes SQL injections, command injections and unrestricted file uploads

5. NF-UQ-NIDS

A comprehensive dataset, merging all the aforementioned datasets. The newly published dataset represents the benefits of shared dataset feature sets, where the merging of multiple smaller ones is possible. This will eventually lead to a bigger and more universal NIDS datasets containing flows from multiple network setups and different attack settings. An additional label feature identifying the original dataset of each flow. This can be used to compare the same attack scenarios conducted over two or more different test-bed networks. The attack categories have been modified to combine all parent categories. Attacks named DoS attacks-Hulk, DoS attacks-SlowHTTPTest, DoS attacks-GoldenEye and DoS attacks-Slowloris have been renamed to the parent DoS category. Attacks named DDOS attack-LOIC-UDP, DDOS attack-HOIC and DDoS attacks-LOIC-HTTP have been renamed to DDoS. Attacks named FTP-BruteForce, SSH-Bruteforce, Brute Force -Web and Brute Force -XSS have been combined as a brute-force category. Finally, SQL Injection attacks have been included in the injection attacks category. The NF-UQ-NIDS dataset has a total of 11,994,893 records, out of which 9,208,048 (76.77%) are benign flows and 2,786,845 (23.23%) are attacks. The table below lists the distribution of the final attack categories.

Please click here to download the dataset.

ClassCount
Benign9208048
DDoS763285
Reconnaissance482946
Injection468575
DoS348962
Brute Force291955
Password156299
XSS99944
Infilteration62072
Exploits24736
Scanning21467
Fuzzers19463
Backdoor19029
Bot15683
Generic5570
Analysis1995
Theft1909
Shellcode1365
MITM1295
Worms153
Ransomware142

6. NF-UNSW-NB15-v2

The NetFlow-based format of the UNSW-NB15 dataset, named NF-UNSW-NB15, has been expanded with additional NetFlow features and labelled with its respective attack categories. The total number of data flows is 2,390,275 out of which 95,053 (3.98%) are attack samples and 2,295,222 (96.02%) are benign. The attack samples are further classified into nine subcategories, Table \ref{un} represents the NF-UNSW-NB15-v2 dataset's distribution of all flows.

Please click here to download the dataset.

ClassCountDescription
Benign2295222Normal unmalicious flows
Fuzzers22310An attack in which the attacker sends large amounts of random data which cause a system to crash and also aim to discover security vulnerabilities in a system.
Analysis2299A group that presents a variety of threats that target web applications through ports, emails and scripts.
Backdoor2169A technique that aims to bypass security mechanisms by replying to specific constructed client applications.
DoS5794Denial of Service is an attempt to overload a computer system's resources with the aimof preventing access to or availability of its data.
Exploits31551Are sequences of commands controlling the behaviour of a host through a known vulnerability
Generic16560A method that targets cryptography and causes a collision with each block-cipher.
Reconnaissance12779A technique for gathering information about a network host and is also known as a probe.
Shellcode1427A malware that penetrates a code to control a victim's host.
Worms164Attacks that replicate themselves and spread to other computers.

7. NF-ToN-IoT-v2

The publicly available pcaps of the ToN-IoT dataset are utilised to generate its NetFlow records, leading to a NetFlow-based IoT network dataset called NF-ToN-IoT. The total number of data flows is 16,940,496 out of which 10,841,027 (63.99%) are attack samples and 6,099,469 (36.01%), the table below lists and defines the distribution of the NF-ToN-IoT-v2 dataset.

Please click here to download the dataset.

ClassCountDescription
Benign6099469Normal unmalicious flows
Backdoor16809A technique that aims to attack remote-access computers by replying to specific constructed client applications.
DoS712609An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS2026234An attempt similar to DoS but has multiple different distributed sources.
Injection684465A variety of attacks that supply untrusted inputs that aim to alter the course of execution, with SQL and Code injections two of the main ones.
MITM7723Man In The Middle is a method that places an attacker between a victim and host with which the victim is trying to communicate, with the aim of intercepting traffic and communications.
Password1153323covers a variety of attacks aimed at retrieving passwords by either brute force or sniffing.
Ransomware3425An attack that encrypts the files stored on a host and asks for compensation in exchange for the decryption technique/key.
Scanning3781419A group that consists of a variety of techniques that aim to discover information about networks and hosts, and is also known as probing.
XSS2455020Cross-site Scripting is a type of injection in which an attacker uses web applications to send malicious scripts to end-users.

8. NF-BoT-IoT-v2

An IoT NetFlow-based dataset generated by expanding the NF-BoT-IoT dataset. The features were extracted from the publicly available pcap files and the flows were labelled with their respective attack categories. The total number of data flows is 37,763,497 out of which 37,628,460 (99.64%) are attack samples and 135,037 (0.36%) are benign. There are four attack categories in the dataset, the table below represents the NF-BoT-IoT-v2 distribution of all flows.

Please click here to download the dataset.

ClassCountDescription
Benign135037Normal unmalicious flows
Reconnaissance2620999A technique for gathering information about a network host and is also known as a probe.
DDoS18331847Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS16673183An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft2431A group of attacks that aims to obtain sensitive data such as data theft and keylogging

9. NF-CSE-CIC-IDS2018-v2

The original pcap files of the CSE-CIC-IDS2018 dataset are utilised to generate a NetFlow-based dataset called NF-CSE-CIC-IDS2018-v2. The total number of flows is 18,893,708 out of which 2,258,141 (11.95%) are attack samples and 16,635,567 (88.05%) are benign ones, the table below represents the dataset's distribution.

Please click here to download the dataset.

ClassCountDescription
Benign16635567Normal unmalicious flows
BruteForce120912A technique that aims to obtain usernames and password credentials by accessing a list of predefined possibilities
Bot143097An attack that enables an attacker to remotely control several hijacked computers to perform malicious activities.
DoS483999An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS1390270An attempt similar to DoS but has multiple different distributed sources.
Infiltration116361An inside attack that sends a malicious file via an email to exploit an application and is followed by a backdoor that scans the network for other vulnerabilities
Web Attacks3502A group that includes SQL injections, command injections and unrestricted file uploads

10. NF-UQ-NIDS-v2

A comprehensive dataset, merging all the aforementioned datasets. The newly published dataset represents the benefits of the shared dataset feature sets, where the merging of multiple smaller datasets is possible. This will eventually lead to a bigger and a universal NIDS dataset containing flows from multiple network setups and different attack settings. It includes an additional label feature, identifying the original dataset of each flow. This can be used to compare the same attack scenarios conducted over two or more different testbed networks. The attack categories have been modified to combine all parent categories. Attacks named DoS attacks-Hulk, DoS attacks-SlowHTTPTest, DoS attacks-GoldenEye and DoS attacks-Slowloris have been renamed to the parent DoS category. Attacks named DDoS attack-LOIC-UDP, DDoS attack-HOIC and DDoS attacks-LOIC-HTTP have been renamed to DDoS. Attacks named FTP-BruteForce, SSH-Bruteforce, Brute Force -Web and Brute Force -XSS have been combined as a brute-force category. Finally, SQL Injection attacks have been included in the injection attacks category. The NF-UQ-NIDS dataset has a total of 75,987,976 records, out of which 25,165,295 (33.12%) are benign flows and 50,822,681 (66.88%) are attacks. The table below lists the distribution of the final attack categories.

Please click here to download the dataset.

ClassCount
Benign25165295
DDoS21748351
Reconnaissance2633778
Injection684897
DoS17875585
Brute Force123982
Password1153323
XSS2455020
Infilteration116361
Exploits31551
Scanning3781419
Fuzzers22310
Backdoor18978
Bot143097
Generic16560
Analysis2299
Theft2431
Shellcode1427
MITM7723
Worms164
Ransomware3425

11. CIC-ToN-IoT

A dataset generated where the feature set of the CICFlowMeter was extracted from the pcap files of the ToN-IoT dataset. The CICFlowMeter-v4 tool was utilised to extract 83 features. There are 5,351,760 data samples where 2,836,524 (53.00%) are attacks and 2,515,236 (47.00%) are benign samples.

Please click here to download the dataset.

ClassCountDescription
Benign2515236Normal unmalicious flows
Backdoor27145A technique that aims to attack remote-access computers by replying to specific constructed client applications.
DoS145An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS202An attempt similar to DoS but has multiple different distributed sources.
Injection277696A variety of attacks that supply untrusted inputs that aim to alter the course of execution, with SQL and Code injections two of the main ones.
MITM517Man In The Middle is a method that places an attacker between a victim and host with which the victim is trying to communicate, with the aim of intercepting traffic and communications.
Password340208covers a variety of attacks aimed at retrieving passwords by either brute force or sniffing.
Ransomware5098An attack that encrypts the files stored on a host and asks for compensation in exchange for the decryption technique/key.
Scanning36205A group that consists of a variety of techniques that aim to discover information about networks and hosts, and is also known as probing.
XSS2149308

Cross-site Scripting is a type of injection in which an attacker uses web applications to send malicious scripts to end-users.

12. CIC-BoT-IoT

The CICFlowMeter-v4 was used to extract 83 features from the BoT-IoT dataset pcap files. The dataset contains 13,428,602 records in total, containing 13,339,356 (99.34%) attack samples and 89,246 (0.66%) benign samples. The attack samples are made up of four attack scenarios inherited from the parent dataset, i.e., DDoS, DoS, reconnaissance, and theft.

Please click here to download the dataset.

ClassCountDescription
Benign89246Normal unmalicious flows
Reconnaissance3514330A technique for gathering information about a network host and is also known as a probe.
DDoS4913920Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS4909405An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft1701A group of attacks that aims to obtain sensitive data such as data theft and keylogging

Project members

Associate Professor Marius Portmann

Associate Professor
School of Electrical Engineering and Computer Science

Dr Siamak Layeghy

UQ Amplify Fellow
School of Electrical Engineering and Computer Science