Authors:
Hai Ngoc Nguyen
1
;
Songpon Teerakanok
2
;
Atsuo Inomata
3
and
Tetsutaro Uehara
1
Affiliations:
1
Cyber Security Lab, College of Information Science and Engineering, Ritsumeikan University, Japan
;
2
Research Organization of Science and Technology, Ritsumeikan University, Japan
;
3
Graduate School of Information Science and Technology, Osaka University, Japan
Keyword(s):
Deep Learning, Word Embeddings, Vulnerability Detection, RNNs.
Abstract:
Many studies have combined Deep Learning and Natural Language Processing (NLP) techniques in security systems in performing tasks such as bug detection, vulnerability prediction, or classification. Most of these works relied on NLP embedding methods to generate input vectors for the deep learning models. However, there are many existing embedding methods to encode software text files into vectors, and the structures of neural networks are immense and heuristic. This leads to a challenge for the researcher to choose the appropriate combination of embedding techniques and the model structure for training the vulnerability detection classifiers. For this task, we propose a system to investigate the use of four popular word embedding techniques combined with four different recurrent neural networks (RNNs), including both bidirectional RNNs (BRNNs) and unidirectional RNNs. We trained and evaluated the models by using two types of vulnerable function datasets written in C code. Our results
showed that the FastText embedding technique combined with BRNNs produced the most efficient detection rate, compared to other combinations, on a real-world but not on an artificially-produced dataset. Further experiments on other datasets are necessary to confirm this result.
(More)