Feature Selection Technique in Network Traffic Dataset

Topic > Feature Selection Technique in Network Traffic Dataset

Security is a big threat to the digital world today. The use of the Internet, computers, mobile devices and tablets has become ubiquitous, and cyber attacks have grown rapidly. There are various types of cyber attacks such as spoofing, sniffing, denial of service, phishing, evil twins, pharming, click fraud and malware. Malicious software is harmful to both your computer and your network. The growth of cyber attacks has increased dramatically and has compromised systems, stolen valuable information and destroyed important facilities, producing large losses, at an average cost of $345 per incident. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original EssayNot only the growth of Internet usage, but also the number of new malware have become another reason for digital threat. Over 317 million new pieces of malware were created in 2014. Conventional antivirus and intrusion detection systems cannot detect zero-day attacks. According to the Symantec Internet Security Threat Report 2010, there are over 5 million pieces of malware circulating on the Internet. As a result, security specialists put a lot of effort into developing an efficient malware detection method. In this work we describe several feature selection techniques, aimed at detecting malware from network traffic dataset using machine learning algorithm. Because feature selection is a very important task for malware detection. Malware can be detected via static and dynamic features. Although antivirus software is developed based on the malware signature, it fails when a zero day malware attack occurs. The malware detection system captures the network traffic dataset to distinguish between malware and goodware (normal and suspicious activity). The network traffic dataset contains many packets with huge features. Some features may be very important, but others may not be relevant to making a decision. However, it increases the processing time and decreases the efficiency of the malware detection system. That's why, the main purpose of feature selection technique is to reduce the dimensionality of feature space, remove redundant and irrelevant features from network traffic dataset. There are many approaches developed to represent the number of malware proliferation that arises every day. Hansen et al. introduced an approach called Random Forests Classifier to detect and classify the large amount of malware from known or unknown malware families. This approach explicitly reduces the feature space. And the Cuckoo sandbox is also used as behavioral traces of the analyzed samples due to achieving a high malware detection rate and family classification. Tian et al. API call logs were used to distinguish malware from cleanware by carefully examining their behavioral characteristics. This work has also been proposed for malware family classification and detection by applying pattern recognition algorithms in a virtual environment. They achieved an accuracy of about 97% using a dataset of 1,368 malware and 456 cleanware. In another study, the applicability of the sandbox environment to obtain the runtime behavior of malware was discussed. The proposed work differentiates malware using a heuristic method called N-gram analysis and adopts the gain feature selection technique.