Deep Learning for Cyber Security
Here we have represented the deep learning (DL) methods for cyber security applications. Here we have describe each DL method , including deep auto-encoders , restricted Boltzmann machines, recurrent neural networks, generative adversarial networks, and several others. Along with this we present a survey of deep learning approaches for cyber security intrusion detection, the datasets used Then we discuss how each of the DL methods is used for security applications. We cover a broad array of attack types including malware, spam, insider threats, network intrusions, false data injection ,etc. The dataset plays an important role in intrusion detection, therefore I have presented the cyber datasets and provide a classification of these datasets into seven categories; namely, network traffic-based dataset, electrical network-based dataset, internet traffic-based dataset, virtual private network-based dataset, android apps-based dataset, IoT traffic-based dataset, and internet-connected devices-based dataset .From these all methods I have concentrated on Deep Boltzmann machines (DBMs).and Restricted Boltzmann machine (RBMs).A BM is a graphical model with undirected links across a set of visible nodes and a set of hidden nodes. Each node is a random variable and has a bias indicating its propensity to activate. A Restricted Boltzmann Machine (RBM) is a BM in which every visible node is connected to every hidden node. There are typically no other connections. DBM is a Unsupervised, probabilistic, generative model with entirely undirected connections between different layers. Also Contains visible units and multiple layers of hidden units. Like RBM, no intralayer connection exists in DBM. Connections exists only between units of the neighbouring layers
The use of data science in cyber security can help to correlate events, identify patterns, and detect anomalous behavior to improve the security posture of any defense program. We can do using an emergence of cyber defense systems leveraging data analytics. In this seminar each DL method is provided that can be use for security applications by considering broad array of attack types including malware, spam, insider threats, etc Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural network .
In deep learning there are several architecture which are used in the cyber security application .In deep learning there are some data sets from which we can use data ,and then classify them as per use of methos algorithms and their architectures. Cyber security is the collection of policies, techniques, technologies, and processes that work together to protect the confidentiality, integrity, and availability of computing resources, networks, software programs, and data from attack.
Cyber defense mechanisms exist at the application, network, host, and data level. There is a plethora of tools—such as firewalls, antivirus software, intrusion detection systems (IDSs), and intrusion protection systems (IPSs)—that work in silos to prevent attacks and detect security breaches.
However, many adversaries are still at an advantage because they only need to find one vulnerability in the systems needing protection. As the number of internet-connected systems increases, the attack surface also increases, leading to greater risk of attack. Furthermore, attackers are becoming more sophisticated, developing zero-day exploits and malware that evade security measures, enabling them to persist for long periods without notice.
Zero-day exploits are attacks that have not been encountered previously but are often variations on a known attack. To exacerbate the problem, attack mechanisms are being commoditized, allowing for rapid distribution without needing an understanding for developing exploits. we have describe each DL method , including deep auto-encoders , restricted Boltzmann machines, recurrent neural networks, generative adversarial networks, and several others. Along with this we present a survey of deep learning approaches for cyber security intrusion detection, the datasets used Then we discuss how each of the DL methods is used for security applications. We have concentrated on a Boltzman machine(BM) which is a graphical model with undirected links across a set of visible nodes and a set of hidden nodes.
Each node is a random variable and has a bias indicating its propensity to activate. Deep Boltzmann machines (DBMs).and Restricted Boltzmann machine (RBMs). A Restricted Boltzmann Machine (RBM) is a BM in which every visible node is connected to every hidden node. There are typically no other connections. DBM is a Unsupervised, probabilistic, generative model with entirely undirected connections between different layers. Although learning is impractical in general Boltzmann machines, it can be made quite efficient in a restricted Boltzmann machine (RBM) which does not allow intralayer connections between hidden units. After training one RBM, the activities of its hidden units can be treated as data for training a higher-level RBM.
This method of stacking RBMs makes it possible to train many layers of hidden units efficiently and is one of the most common deep learning strategies. As each new layer is added the generative model improves. An extension to the restricted Boltzmann machine allows using real valued data rather than binary data. One example of a practical RBM application is in speech recognition. DBMs can learn complex and abstract internal representations of the input in tasks such as object or speech recognition, using limited, labeled data to fine-tune the representations built using a large set of unlabeled sensory input data. However, unlike DBNs and deep convolutional neural networks, they pursue the inference and training procedure in both directions, bottom-up and top-down, which allow the DBM to better unveil the representations of the input structures.However, the slow speed of DBMs limits their performance and functionality. Because exact maximum likelihood learning is intractable for DBMs, only approximate maximum likelihood learning is possible. Another option is to use mean-field inference to estimate data-dependent expectations and approximate the expected sufficient statistics by using Markov chain Monte Carlo (MCMC).This approximate inference, which must be done for each test input, is about 25 to 50 times slower than a single bottom-up pass in DBMs. This makes joint optimization impractical for large data sets, and restricts the use of DBMs for tasks such as feature representation.
Approaches of DL use for cyber Security-
- FFDNN: Feed forward deep neural network;
- CNN: Convolutional neural network;
- DNN: Deep neural network;
- RNN: Recurrent neural network;
- DBN: Deep belief net- work;
- RBM: Restricted Boltzmann machine;
- DBM: Deep Boltzmann machine;
- DA: Deep auto-encoder;
In this seminar I’m mostly focused on Boltzman Machine(BM).having two types-
1)Deep Boltzmann machines (DBMs).
2) Restricted Boltzmann machine (RBMs)
A Boltzman Machine(BM) is a graphical model with undirected links across a set of visible nodes and a set of hidden nodes. Each node is a random variable and has a bias indicating its propensity to activate. A Restricted Boltzmann Machine (RBM) is a BM in which every visible node is connected to every hidden node. There are typically no other connections. DBM is a Unsupervised, probabilistic, generative model with entirely undirected connections between different layers. Also Contains visible units and multiple layers of hidden units. Like RBM, no intralayer connection exists in DBM. Connections exists only between units of the neighbouring layers
For a learning problem, the Boltzmann machine is shown a set of binary data vectors and it must find weights on the connections so that the data vectors are good solutions to the optimization problem defined by those weights. To solve a learning problem, Boltzmann machines make many small updates to their weights, and each update requires them to solve many different search problems.
An RBM is an undirected graphic model G = { W ij , b i , c j } , as pre- sented in Fig . There are two layers, including, the hidden layer and the visible layer.
The two layers are fully connected through a set of weights W ij and { b i , c j }. Note that there is no connection be- tween the units of the same layer. Refer to Fischer and Igel [147] , the configuration of the connections between the visible units and the hidden units has an energy function, which can be defined as follow: En ( V, H, G ) = − _ i _ j V j H j W ij − _ i ∈ V b i V i − _ j∈ H c j H j (4) Based on this energy function, the probability of each joint con- figuration can be calculated according to the Gibbs distribution as follow: P rob ( V, H, G ) = −1 Z(G ) e −En ( V,H,G ) (5) where Z is the partition function, which can be calculated as fol- low: Z ( G ) = _ V ∈V _ H∈V e −En ( V,H,G ) (6) where curved letters V and V are used to denote the space of the visible and hidden units, respectively.
A DBM is a network of symmetrically coupled stochastic binary units, which contains a set of visible units and a sequence of layers of hidden units, as presented in Fig ., a DBM with three hidden layers can be defined by the energy of the state { V, H } as: En ( V, H, G ) == −V T W 1 H 1 −V 1 W 2 H 2 −V 2 W 3 H 3 (8) where H = { H 1 , H 2 , H 3 } are the set of hidden units, and G = { W 1 , W 2 , W 3 } are the model parameters. The probability that the model assigns to a visible vector V can be defined as: P rob ( V, G ) = 1 Z(G ) _ H e −En ( V,H,G )
Basic Architecture:
- Deep Learning Methods Used in Cyber Security
This section describes the different DL methods used in cyber security. References to important methodology papers are provided for each technique.
3.1. Deep Belief Networks
A seminal paper by Hinton introduced Deep Belief Networks (DBNs). They are a class of DNNs composed of multiple layers of hidden units with connections between the layers but not between units within each layer. DBNs are trained in an unsupervised manner. Typically, they are trained by adjusting weights in each hidden layer individually to reconstruct the inputs.
3.1.1. Deep Autoencoders
Autoencoders are a class of unsupervised neural networks in which the network takes as input a vector and tries to match the output to that same vector. By taking the input, changing the dimensionality, and reconstructing the input, one can create a higher or lower dimensionality representation of the data. These types of neural networks are incredibly versatile because they learn compressed data encoding in an unsupervised manner. Additionally, they can be trained one layer
at a time, reducing the computational resources required to build an effective model. When the hidden layers have a smaller dimensionality than the input and output layers, the network is used for encoding the data (i.e., feature compression). An autoencoder can be designed to remove noise and be more robust by training an autoencoder to reconstruct the input from a noisy version of
the input ,called a denoising autoencoder . This technique has been shown to have
more generalizability and robustness than typical autoencoders.
3.2. Recurrent Neural Networks
A recurrent neural network (RNN), as shown on Figure 7, extends the capabilities of a traditional neural network, which can only take fixed-length data inputs, to handle input sequences of variable lengths. The RNN processes inputs one element at a time, using the output of the hidden units as additional input for the next element. Therefore, the RNNs can address speech and language problems
as well as time series problems
3.3. Convolutional Neural Networks
A convolutional neural network (CNN) [35,36] is a neural network meant to process input stored in arrays. An example input is a color or grayscale image, which is a two-dimensional (2D) array of pixels. CNNs are often used for processing 2D arrays of images or spectrograms of audio. They are also used frequently for three-dimensional (3D) arrays (videos and volumetric images). Their use to one-dimensional (1D) arrays (signals) is less frequent but increasing. Regardless of the dimensionality, CNNs are used where there is spatial or temporal ordering. The architecture of a CNN (Figure 8) consists of three distinct types of layers:
1)convolution layers,
2)pooling layers, and the
3) classification layer.
The convolution layers are the core of the CNN. The weights define a convolution kernel applied to the original input, a small window at a time, called the receptive field. The result of applying these filters across the entirety of the input is then passed through a non-linearity, typically an ReLU, and is called a feature map. These convolution kernels, named after the mathematical convolution operation, allow close physical or temporal relationships within the data to be accounted for, and help reduce memory by applying the same kernel across the entirety of the image.
Pooling layers are used to perform non-linear down sampling by applying a specific function, such as the maximum, over non-overlapping subsets of the feature map. Besides reducing the size of the feature maps, and therefore, the memory required, pooling layers also reduce the number of
parameters, and therefore, overfitting. These layers are generally inserted periodically in between convolution layers and then fed into a fully connected, traditional DNN. Additionally, CNNs can use regularization techniques that help reduce overfitting. One of the most successful techniques is called “dropout” . When training a model using dropout, during each training iteration, a specified percentage of nodes in a given layer and their incoming and outgoing connections, are randomly removed. Including dropout typically improves the accuracy and generalizability of a model because it increases the likelihood a node will be useful. Uses of CNNs are significantly varied. The greatest success has been achieved with computer vision tasks such as scene and object detection and object identification [38]. Applications range from biology to facial recognition . The best showcase of CNN success took place in 2012 at the
ImageNet competition, where a CNN surpassed the performance of other methods, and then human accuracy in 2015 through the use of GPUs, ReLUs, dropout, and the generation of additional images. In addition, CNNs have been used successfully in language models for phoneme detection , letter recognition , speech recognition , and language model building
3.4. Generative Adversarial Networks
Generative adversarial networks (GANs), which are shown in Figure , are a type of neural network architecture used in unsupervised machine learning, in which two neural networks compete against each other in a zero-sum game to outsmart each other. Developed by Goodfellow et al. , one network acts as a generator and another network acts as a discriminator. The generator takes in input data and generates output data with the same characteristics as real data. The discriminator takes in real data and data from the generator and tries to distinguish whether the
input is real or fake. When training has finished, the generator is capable of generating new data that is not distinguishable
Since first developed, GANs have shown wide applicability, especially to images. Examples include image enhancement , caption generation, and optical flow estimation . Facebook even has an open-source, pre-trained GAN for image generation called deep convolution generative adversarial network (DCGAN) .
3.5. Recursive Neural Networks
Recursive neural networks are neural networks that apply a set of weights recursively to a series of inputs. In these networks, the output of a node is used as input for the next step . Initially, the first two inputs are fed into the model together. Afterward, the output from that is used as an input along with the next step. This type of model has been used for various natural language processing tasks and image segmentation.
Total algorithms
•Application of DL techniques to a wide variety of these cyber security attack types that targeted networks, application software, host systems, and data.
•We also provided a comprehensive review of the documented uses of DL methods to detect these cyber attacks.
• Current approaches treated the different attack types in isolation.
• Future work should consider the cascading connection of malicious activities throughout an attack lifecycle (e.g., breach, exploitation, command and control, data theft, etc.).
Comments
Post a Comment