The purpose of this newly published research by Xueyan Tang is to investigate the use of deep learning in the identification of smart contract vulnerabilities. The research was co-authored by Yuying Du, Alan Lai, Ze Zhang and Lingzhi Shi.
Blockchain technology relies heavily on smart contracts, which are critical to the creation of decentralized apps. On the other hand, system failures and monetary losses may result from smart contract weaknesses. Static analysis tools are often used to find vulnerabilities in smart contracts, however because of their strong dependence on preset criteria and limited ability to do semantic analysis, they frequently produce false positives and false negatives.
These pre-established rules also fail to generalize or adapt to new facts, and they go out of date very rapidly. Deep learning techniques, on the other hand, may learn the characteristics of vulnerabilities during the training process and do not need predetermined detection algorithms.
Our proposed deep learning-based solution, Lightning Cat, is presented in this work. We train three deep learning models—Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN—to identify vulnerabilities in smart contracts. According to experimental data, the Optimized-CodeBERT model outperforms previous approaches in the Lightning Cat that we offer, obtaining an f1-score of 93.53%. We get portions of susceptible code functions to preserve crucial vulnerability aspects in order to accurately extract vulnerability attributes.
We were able to more precisely capture the syntax and semantics of the code by preprocessing the data using the CodeBERT pre-training model. We assess our suggested solution’s viability by using the SolidiFI-benchmark dataset, which comprises 9369 susceptible contracts that have been injected with vulnerabilities of seven distinct kinds.
The Lightning Cat may be expanded to other areas of code vulnerability identification in addition to smart contract vulnerability detection25.
Lightning Cat has the ability to identify and understand all kinds of vulnerabilities via extensive training and learning on a vast array of code samples. In order to better discover code vulnerabilities, it additionally preprocesses data using the CodeBERT pre-trained model. It may thereby identify several types of code vulnerabilities, improving the code’s security and reliability.
Deep Learning for Detecting Flaws in Smart Contracts
Our suggested Lightning Cat tool, which greatly enhances model performance, uses the previously discussed techniques to extract important information from vulnerability code and has robust semantic analysis capabilities.
Model 1: Optimized-CodeBERT
A pre-training model called CodeBERT was created especially for learning and parsing source code. It is built on the Transformer architecture. CodeBERT learns about the syntax and semantic links found in source code, as well as the dynamic interactions between various code segments, by pre-training on large-scale code corpora.
Input IDs and masks cannot be directly processed for the Optimized-LSTM and Optimized-CNN models. In order to further analyze the input and turn it into tensor representations of embedding vectors, CodeBERT is used.
To produce meaningful representations of the source code data, the CodeBERT model is fed the input IDs and attention masks that were acquired during the preprocessing phases. These embedding vectors may be integrated with Optimized-LSTM and Optimized-CNN models for later vulnerability identification by serving as inputs for such models.
Model 2: optimized-LSTM
Specifically engineered to handle sequential data, the Optimized-LSTM model may record syntactic-semantic information as well as temporal dependencies. Our built Optimized-LSTM model offers a serialization-based representation of Solidity source code, accounting for the statements and function calls in the code, for the purpose of detecting smart contract vulnerabilities.
The logical structure and execution flow of the code may be understood thanks to the Optimized-LSTM model’s ability to capture the syntax, semantics, and dependencies found in the code.
Model 3: optimized-CNN
A feedforward neural network with notable benefits for handling two-dimensional input, as the two-dimensional structures code represents, is the convolutional neural network (CNN). By converting the code token sequence into a matrix as part of our model architecture, CNN is able to successfully extract local features from the code and capture its spatial structure. This allows CNN to catch key patterns within the code as well as the links between code blocks and grammatical structure.
This work presents Lightning Cat, a deep learning tool that leverages three models (Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN) to identify vulnerabilities in smart contracts. After optimizing and comparing three models, we discovered that Optimized-CodeBERT performed best when it came to evaluation parameters like F1-score, Accuracy, and Precision.
This study preprocessed the data using the CodeBERT pre-trained model, which enhanced the performance of code semantic analysis. We removed problem code segments functions during data preparation, which addressed the length restriction issue with deep learning for processing lengthy texts in addition to taking into account the essential elements of smart contract vulnerability code.
By avoiding problems like overfitting from too short texts or ambiguous characteristics from too lengthy texts, this method enhances the performance of the model. The findings indicate that the suggested approach performs better at detection because it uses more sensible model optimization and data pretreatment.
The Optimized-CodeBERT model fared better in identifying three sorts of vulnerabilities than Slither, Optimized-LSTM, and Optimized-CNN, but it was less effective in detecting one type, according to this paper’s analysis of the detection performance of each type of vulnerability.
This is due to the fact that many models vary in terms of their learning algorithms, structures, and parameters, all of which have an impact on how well they model and generalize.
As a result, our goals for future work include enhancing Lightning Cat’s three models’ performance and expanding its use beyond the identification of smart contract vulnerabilities to other code security domains.