我的审稿意见

Intelligent Vulnerability Detector using deep sequence and graph based Hybrid Feature Extraction

image-20231210113010116

This manuscript proposed a graph-based and sequence-based neural network model for detecting vulnerabilities in Java code, utilizing multiple program features,which addresses the detection problem of a range of vulnerabilities collected from the Common Weakness Enumeration (CWE) . It introduces GCN-RFEMLP for extracting graph-based features and employs CodeBERT for extracting sequence-based features. However, there are some critical issues outlined in the manuscript making the referee has to reject it.

Comments:

  1. The COVID-19 pandemic, which is unrelated to the research and should not have been mentioned.
  2. The manuscript presents a list of seven contributions; however, they lack conciseness and do not effectively emphasize the primary contributions.
  3. Certain figures in the manuscript are not appropriate. Figure 1 appears to be more focused on the classification of machine learning methods and lacks contextual relevance, considering that the manuscript is specifically about vulnerability detection. Other figures also suffer from similar issues, as they seem to be detached from vulnerability detection and lack any meaningful connection. Figures 3 and 4 depict the node2vec process and GCN, respectively. However, these figures are not relevant to the vulnerability detection discussed in the paper and do not contribute to the study. Instead, the figures should focus on illustrating the transformation process from source code to code property graph, highlighting the comprehensive model proposed in the manuscript.
  4. In the experimental section, the formatting of the tables presenting the experimental results lacks consistency. And, it is customary to report experimental results with two decimal places, such as 98.90. It is important to ensure that other result comparison data follow the same formatting convention.
  5. The dataset description in the manuscript lacks clarity, and there is no mention of the labeling process for the data. Additionally, the comparison with other benchmarks does not indicate the dataset that was utilized.
  6. Moreover, the manuscript lacks relevant explanations and approaches for addressing data imbalance, which can pose a risk of overfitting.

VulDet-BC: Binary Software Vulnerability Detection Based on BiGRU and CNN

image-20231210113205646

this manuscript 提出了一种二进制漏洞检测方法VulDet-BC,从二进制机器指令级别,结合BiGRU与CNN构建二进制漏洞检测模型,其中利用了注意力机制,并且通过与一些基线的对比在一些指标上优于基线,However, there are some critical issues outlined in the manuscript making the referee has to reject it.

Comments:

  1. 文章在相关工作部分对漏洞检测这一研究领域的介绍不够充分,近期的许多新颖的工作并没有被提及。
  2. 深度学习方法常常基于一定的漏洞模式,来实现漏洞检测。文稿中的方法将二进制机器指令转换为数字形式,再转换过程中并没有提出任何与漏洞模式相关的概念,无论在语义或是语法上;
  3. 文中提到的BiGRU结合注意力模块并不新颖,且文中实验部分所提出的一系列RQ,缺乏对漏洞成因的思考,仅仅是从深度学习的角度在进行消融实验;
  4. BinVulDet,在文稿中是第40个引用,是比较新的工作,在伪代码级别检测二进制软件的漏洞,文稿既没有对其工作进行介绍(related work)也没有与其进行对比研究。源代码漏洞检测方法VulDeePecker使用了code-gadget结合BiLSTM构建漏洞检测模型,但是文中似乎使用后半部分的BiLSTM进行对比,这样的对比实验设计已经不是同VulDeePecker工作进行对比了,这显然是错误的;
  5. 文稿中缺乏对漏洞检测任务的误报和漏报的分析,即FNR和FPR,这在漏洞检测的工作中非常重要,且缺乏对真实世界的软件漏洞进行检测的实验研究,使实验中提出的RQ变得更加单薄,对论文的研究缺乏支撑度;

This manuscript proposed a binary software vulnerability detection method called VulDet-BC, which operates at the binary machine instruction level. It employs a combination of BiGRU and CNN along with attention mechanisms to build a vulnerability detection model. The manuscript claims superiority over baselines in certain metrics. However, the manuscript has several critical issues outlined below, which led the referee to reject it.

  1. The related work of the paper lacks a comprehensive review of the research field of vulnerability detection. Many recent and innovative works in the field have not been mentioned.

2.Deep learning methods often rely on some vulnerability patterns to achieve effective vulnerability detection. The proposed method in the manuscript converts binary machine instructions into numeric representations without introducing any concepts related to vulnerability patterns, either semantically or syntactically.

3.The combination of BiGRU and attention mechanisms mentioned in the paper is not novel. Additionally, the series of research questions(RQ) proposed in the experimental section lacks contemplation on the causes of vulnerabilities. The experiments conducted only focus on the impact of deep learning techniques.

4.“BinVulDet” is referenced as the 40th citation in the manuscript and represents a relatively recent work that focuses on detecting vulnerabilities in binary software at the pseudo code level. However, the manuscript fails to provide an introduction to this work in the related work section and does not compare it with the proposed method. The source code vulnerability detection method “VulDeePecker” utilizes code-gadgets combined with BiLSTM to build a vulnerability detection model. However, it seems that the manuscript incorrectly compares its method with only the latter part, BiLSTM, which is not a valid comparison to the original VulDeePecker work. This discrepancy in the experimental design is evidently an error.

5.The manuscript lacks an analysis of false negatives (FNR) and false positives (FPR), which are crucial in vulnerability detection. Furthermore, there is a lack of experiments on detecting real-world software vulnerabilities, making the proposed research questions less substantiated.