An Intelligent Penetration Test Simulation Environment Construction Method Incorporating Social Engineering Factors

Li, Yang; Wang, Yongjie; Xiong, Xinli; Zhang, Jingye; Yao, Qian

doi:10.3390/app12126186

Open AccessArticle

An Intelligent Penetration Test Simulation Environment Construction Method Incorporating Social Engineering Factors

¹

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China

²

Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(12), 6186; https://doi.org/10.3390/app12126186

Submission received: 17 May 2022 / Revised: 10 June 2022 / Accepted: 15 June 2022 / Published: 17 June 2022

(This article belongs to the Special Issue Artificial Intelligence-Based Approaches for Future Cybersecurity Applications and Crime Detection)

Download

Browse Figures

Versions Notes

Abstract

:

The penetration test has many repetitive operations and requires advanced expert knowledge, therefore, the manual penetration test is inefficient. With the development of reinforcement learning, the intelligent penetration test has been a research hotspot. However, the existing intelligent penetration test simulation environments only focus on the exploits of target hosts by the penetration tester agent’s actions while ignoring the important role of social engineering in the penetration test in reality. In addition, the construction of the existing penetration test simulation environment is based on the traditional network graph model without integrating security factors and attributes, and it is difficult to express the interaction between the penetration tester and the target network. This paper constructs an improved network graph model for penetration test (NMPT), which integrates the relevant security attributes of the penetration test. The NMPT model lays the foundation for extending the penetration tester’s social engineering actions. Then, we propose an intelligent penetration test method that incorporates social engineering factors (SE-AIPT) based on the Markov Decision Process. We adopt several mainstream reinforcement learning algorithms to train attack agents. The experiments show that the SE-AIPT method could vividly model the penetration tester agent’s social engineering actions, which effectively improves the reality of the simulation environment. Moreover, the penetration tester agent shows superior effects in the attack path discovery in the intelligent penetration test simulation environment constructed by the SE-AIPT method.

Keywords:

penetration test; social engineering; reinforcement learning; simulation environment construction

1. Introduction

With the development of information technology, the network environment is increasingly complex, and network security threats are becoming increasingly severe. How to protect computer systems from unauthorized access or attack is an essential issue [1]. As a proactive method, the penetration test evaluates network vulnerability by discovering potential threat paths of target networks from the perspective of a penetration tester [2]. However, with the growth of the network information system at a large scale and complexity, it takes a lot of time to complete a penetration test, and there are a lot of repetitive actions in the process, which results in high costs [3]. Therefore, it is extremely urgent to examine how to carry out penetration tests autonomously and intelligently.

Automated penetration tests have been proposed in recent years. Early studies, such as attack tree, attack graph, and planning field programming language (PDDL), are representative. These methods plan the attack path through a formal representation of the target network configuration information and the analysis of state transition [2,3,4]. However, they all need to know the target network information in advance and could not model the uncertainty of the real penetration test well. With the development of the field of artificial intelligence, reinforcement learning methods have obvious advantages in sequential decision-making problems. The Markov decision process (MDP) formally describes the environment in reinforcement learning (RL), which underlies the intelligent penetration test. It is necessary to adopt RL to construct a dynamic environment to train the attack agent for the intelligent decision of exploit actions. The environment could provide appropriate feedback and incentive for the agent to drive the agent to explore and exploit the environment [5]. Penetration tests in the real world are a complex process. In order to train the penetration tester agent better to adapt to the complex penetration test scenarios, it is significant to construct a highly simulated environment of RL.

At present, simulation environments of the penetration test based on RL have been proposed, such as Nasim [6], CybORG [7], CyberBattleSim [8] and CyGIL [9]. The simulation environments mentioned above are widely used in the existing research, and they are stable and extensible. However, there are still some problems to be solved. On the one hand, they are not fully simulating all attack actions of penetration tests in the real world. In addition, with the development of modern security systems, it becomes more and more difficult for penetration testers to perform a penetration test on the target network [10], and human beings cannot complete tasks according to fixed instructions like a machine. It is inevitable that mistakes will occur in the process of performing tasks. Social engineering is a way of grasping human defects and exploiting the target. The method is a supplementary and important way for testers to conduct penetration tests [11]. Therefore, it is significant to extend the penetration tester’s actions from the perspective of social engineering. On the other hand, the construction of a simulation module in the existing simulation environment is based on the traditional network graph model, which only includes the basic elements and connection relations of the network and does not incorporate the security attributes and elements related to the penetration test. Therefore, it is necessary to improve the exited network graph model and integrate it into security-related attributes and elements in the penetration test.

In conclusion, the following challenges need to be stressed:

Challenge 1: It is inevitable that mistakes will occur in manual penetration tests. Social engineering is a useful way to exploit human defects. However, the existing simulation environment does not incorporate social engineering.
Challenge 2: The existing simulation environment is based on the traditional network graph model, which only includes the basic elements and connection relations of the network and does not incorporate the security attributes and elements related to penetration tests.

To address these challenges, we first propose a network graph model for the penetration test by extending the security-related properties on the basis of the traditional network graph model. By analyzing the characteristics and mechanism of social engineering methods, the modeling of social engineering exploitation actions by the network graph model is improved. Then, on the basis of the model, a penetration test model is established by considering various factors in the penetration test. The penetration test model is combined with the Markov decision process, which is the paradigm of the reinforcement learning method in the field of artificial intelligence, to establish an intelligent penetration test model extended by social engineering factors. Finally, on this basis, an intelligent penetration test simulation environment with extended social engineering factors is established. The proposed method can be used in various intelligent penetration test simulation environments.

In summary, the main contributions of this paper are as follows:

To improve the exited network graph model and integrate it into security-related attributes and elements in the penetration test. We propose an improved network graph model for penetration test (NMPT); it could be better used to describe the penetration test process.
To expand the penetration tester agent actions and incorporate social engineering factors into the intelligent penetration test. We propose an intelligent penetration test simulation environment construction method based on social engineering factors (SE-AIPT). The integration of social engineering actions provides a new way and possibility for the penetration tester agent to discover the attack path during a penetration test. The research in this paper can further provide an interactive penetration test simulation environment and a platform for RL algorithms to train penetration tester agents.

The rest of the paper is structured as follows: In Section 2, we review the related work of intelligent penetration tests and introduce the background knowledge. In Section 3, we present the improved network graph model NMPT and the technical details of the SE-AIPT method. In Section 4, we conduct a series of experiments and analyze the experimental results. In Section 5, we summarize the paper and analyze the prospects of future research.

2. Background

2.1. Manual Penetration Test and AI-Driven Penetration Test

A penetration test is generally regarded as a vulnerability assessment method that simulates malicious hackers’ attacks on target networks. There are currently various frameworks and processes for the specific steps of penetration testing. Sugandh Shah et al. [12] describe the whole process of vulnerability assessment and penetration testing (VAPT) and all the methods, models and criteria. At the same time, they divided the penetration test into four major stages: planning and preparation stage, detection and penetration exploitation stage, post-penetration and data filtering stage and reporting and trace clearing stage. The proportion of time and resources consumed at each stage in the whole penetration test process is shown in Figure 1.

It can be seen from Figure 1 that the preliminary collecting of information and the exploitation of the target network occupy most of the time and resources in the penetration test process. These tasks are often repetitive and complicated and require certain expert knowledge and experience [13]. The entire penetration test process consumes a lot of human costs. Therefore, the realization of an automated penetration test is currently a prospective research direction.

With the development of artificial intelligence, AI-driven penetration tests provide new solutions to the problems that existed in manual penetration tests. The comparison of AI-driven penetration tests with manual penetration tests is shown in Figure 2.

To solve the problems of high human cost and expert knowledge, the automated penetration test was proposed. At present, mature automated penetration test tools include APT2, Autosploit and Awesome-Hacking-Tools. These penetration test tools improve the efficiency of penetration tests. However, there are still some problems [14]. From the host level, these penetration test tools simply integrate existing network attack tools and lack the reasoning ability of network attack and defense knowledge. They cannot intelligently select attack payloads and configure the paframeters of the payload for a penetration test based on the state of the target host. From the network level, most of these penetration test tools perform a vulnerability assessment for a single host without expanding to the entire target network. In addition, these tools lack the ability to discover potential attack paths of the target network. The key to solving the above problems is how to intelligently realize the selection of vulnerability exploitation for targets and the selection of attack payloads.

Artificial intelligence is widely used in automated penetration tests and could improve the efficiency of the potential attack path discovery in the target network. According to the current target network status, the artificial intelligent penetration test (AIPT) selects the exploit target and related method intelligently. In essence, the AIPT solves the sequential decision-making problem in a specific environment [15]. In the interactive process of continuous learning and training, AIPT could make a decision autonomously for the penetration test of the target network.

RL is to learn how to map the current states into actions. It makes the agent dynamically explore and exploit the environment to find appropriate strategies for actions [16]. One of the main advantages of using RL is that it allows us to deal with problems without assuming any presupposition knowledge or model of the probability of a given state and action’s outcome. Instead, it allows agents to learn this knowledge, which provides a solution to automated penetration test problems [17] by interacting with the environment. Sarraute et al. modeled a penetration test as a POMDP (partially observable Markov decision process) [18]. In this way, the agent can intelligently mix scanning and exploiting actions, but it must break down the network into POMDPs against a single target machine for a penetration test. It is computationally infeasible as the network size grows [19]. Durkota et al. [20] proposed modeling a penetration test as a Markov decision process, in which the action space consists of specific vulnerabilities, the state space consists of attack actions and results, and the reward function consists of constant state transitions volume and loss value. The goal of the entire model is to minimize the expected loss value. J. Hoffmann [21] proposed a method based on Durkota’s work, which completely ignores the configuration of the target system and relies on expressing the penetration tester’s uncertainty in the form of possible action outcomes. This is a model-free method [22] that requires minimal prior knowledge of the environment, which models the uncertainty of the penetration tester in the form of possible outcomes, and trains the penetration tester by interacting with the environment. Establishing the simulation environment of a penetration test to enable the training of a penetration tester agent is a process that is similar to how a player interacts with a game to discover its solution [23].

2.2. Intelligent Penetration Test Simulation Environment

The intelligent penetration test simulation environment provides a highly abstract simulation environment for the network scenario of a penetration test and supports RL algorithms to train penetration tester agents to interact with the simulation environment. In recent years, a series of intelligent penetration test simulation environments have been proposed. The Network Attack Simulator (NASim) [6] supports red agent training for network-wide penetration tests. NASim provides a simple way to configure the topology and host vulnerability information of the target network. However, there are still some problems in the definition of agent actions and the real construction of the network topology. CybORG [7] is designed to support both simulated and emulated environments with the same front-end interface. CyberORG focuses on the process of red-blue confrontation in cyberspace and models the actions of agents at the command level. However, the simulation environment constructed by this method is more suitable only when the penetration target is realized at a lower level and lacks the abstraction of the host level in the network. CyberBattleSim (CBS) [8] is built on OpenAIgym for red Agent training, focusing on the lateral movement phase of a cyber attack in an environment that simulates a fixed network with configuration vulnerabilities. The simulation environment built by CBS is highly abstract, with good scalability and readability. However, there are some limitations in modeling the penetration test process in the real world. CyGIL [9] uses a stateless environment architecture and incorporates the MITRE ATT&CK framework to establish a high-fidelity training environment. Its environment’s design allows agent training to focus on specific advanced persistent threats (APT) but lacks a detailed definition of other exploits and methods.

The above simulation environment is widely used in research, but there are still some problems in the construction of the simulation environment. The key appeal of the emulated training environment is its high fidelity, resulting in an agent more applicable to the real world. If the simulated environment is constructed to deviate from reality, this may make the trained proxy decision model less relevant. From the technical perspective of the penetration test, with the introduction of more and more hardware devices and security software and the continuous improvement of network security solutions, the difficulty of using technical means simply to complete the intrusion has greatly increased [24]. In the real world, the information that a penetration tester obtains through exploits is very limited. Social engineering methods focus on the weaknesses of humans [25] and use corresponding means of exploiting people who hold key information about the target network so as to obtain the authority of the target network and achieve the goal of the penetration test [26]. However, none of the currently proposed penetration test environments have modeled the penetration tester’s social engineering behavior, and the authenticity of the simulated environment construction needs to be improved. From the perspective of simulation environment design and implementation, the current penetration test simulation environment is mainly composed of three parts: network scenario generation, simulation model and algorithm evaluation. The network scenarios and simulation module is designed and implemented based on a traditional network graph model. The traditional network graph model is established by nodes and elements, such as edge and connection relation. However, the real world of the penetration tester and the target network interaction is often accompanied by security properties and factors [27], such as vulnerability and attack infiltration.

In summary, the current construction of an intelligent penetration test simulation environment has the problem of low simulation degree, ignoring the key role of social engineering means leading non-traditional information security in the real world penetration test process. At the same time, the current network graph model used to describe penetration tests lacks network security-related elements and attributes. It is difficult to describe the interaction between the penetration tester and the target network. The traditional network graph model is not suitable for the penetration test-oriented simulation environment construction.

3. Methods

3.1. Network Graph Model for Penetration Test

3.1.1. Definition of the Model

Definition 1.

Network graph model

The network graph model for penetration test (NMPT) is a network graph model that is suitable for describing the penetration test process. NMPT is constructed by extending the attributes of edges and nodes based on the traditional network graph model. NMPT is defined as a 4-tuple:

G (t) = 〈 V, E, W, H 〉

(1)

where V is the set of nodes. E is the set of edges.

W

is the connection relation between nodes, expressed in the form of a matrix. H represents the network hierarchy, indicating the location and hierarchical relationship of nodes in the network topology.

Definition 2.

Nodes in NMPT

For the network graph model,

G (t) = 〈 V, E, W, H 〉

, where the set of nodes

V = \{v_{1}, v_{2}, \dots, v_{n}\}, n \in N

, for each node in the network, according to the network security attributes related to the penetration test, the node attributes is defined as a 3-tuple:

V_{n} (t) = 〈 Vid, Vtype, Vattr 〉

(2)

where Vid is the identification of a node. Vtype indicates the type of node. Vattr indicates the attribute of node.

(1): Node identification

Due to modern network’s characteristics of gradation, heterogeneity and complexity, we consider dividing the Vid into two parts, mandatory information and custom information, according to the usage of the scenario. Therefore, the node identifier Vid is defined as:

Vid = {MInfo; CInfo} = \{Vname, Vnum; Info^{1}, \dots, {Info}^{n}\}

(3)

MInfo is the mandatory information segment, including the node name, serial number, etc., as the basic information that must be provided to relevant personnel. The serial number is the globally unique identifier of the node. CInfo indicates the user-defined information segment. The user-defined information segment meets the requirements of different network scenarios and serves as a supplement for identification information.

(2): Node type

Due to social engineering methods, the penetration test is widely used with a high threat and characteristics of a high yield. Therefore, we consider the characters and their effect on social relations in the process of the penetration test. We set the node type as not limited to the host node type, and it contains person nodes, so the node type is expressed as:

Vtype = \{{Vtype}_{H}, {Vtype}_{p}\}

(4)

Vtype

_{H}

is the host node type, and Vtype

_{P}

is the person node type. Because host nodes have multiple meanings and are often mobile and virtual, host node types can be further represented as Vtype

_{H}

= <Vmob,Vvrm>. Vtype

_{P}

is classified into administrators and common users based on their operation rights. Vtype

_{P}

= { Administrator,User}.

(3): Node properties

The penetration test process involves many attributes related to network security. People, as a key factor in social networks and social engineering, should be included in the network model for a penetration test.

a. Attributes of host-type nodes

Attributes of the host include service, operating system version, vulnerabilities, the value of host, permission level, current running status and host assets, which are represented by 7-tuples:

{Vattr}_{H} = 〈Service, OSV, Vuln, Val, CPL, CRS, Prop_{H}〉

(5)

Service indicates the open service information of the host. OSV indicates the operating system version information of the host. The host operating system version could be Windows, Linux or CentOS. Vuln refers to the vulnerability of the host. The vulnerability of the host consists of identification and vulnerability information, which is expressed as follows:

Vuln = 〈{ID}_{vuln}, {Info}_{vuln}〉

(6)

ID

_{vuln}

is the identifier of vulnerability, representing a specific host vulnerability. Info

_{vuln}

represents the vulnerability information, including vulnerability type, vulnerability description, exploitation effect, success probability and exploitation cost, which is expressed as follows:

{Info}_{vuln} = 〈{Type}_{vuln}, {Desc}_{vuln}, Eff, Prob, Cost〉

(7)

where Type

_{Vuln}

is the type of vulnerability, including remote vulnerability and local vulnerability. Desc

_{vuln}

is a description of vulnerabilities, which is used to expand vulnerability explanation information. Eff refers to the effect generated by the exploit of the vulnerability, such as host permissions obtained by exploiting means and exposure of new potentially reachable nodes and related connection credentials. Prob represents the probabilities related to the exploit vulnerability. CVSS vulnerability scoring standard is used as the criterion to evaluate the probability of vulnerability success. Cost refers to the cost of the resources consumed by exploiting the vulnerabilities. The Cost setting is related to the difficulty of exploiting vulnerabilities and the amount of resources consumed by the penetration tester.

Val indicates the value of the host. The larger the value is, the more key sensitive information the host contains, and it is worthwhile for penetration tester and defenders to attack or defend. CPL indicates the attacker’s permission level on the host, which is expressed as:

CPL = {

NoAccess, LocalUser, Admin, System }. CRS indicates the running status of the current host, and it is expressed as:

CRS = {

Running, Down }, where Running and Down indicate the running status and downtime status, respectively. Prop

_{H}

is the network asset owned by the host, which is expressed as: Prop

_{H} = <

Data, File, Cred, Link >, where Data represents the sensitive data contained in the host, File represents the file directory contained in the host, Cred represents the connecting credentials contained in the host and Link represents the links and pathways pointed by the host to other host nodes and key directories. Combined with the above definition of host node attributes, the host node attribute information in the enterprise network identified as HOST1 is expressed in Figure 3:

b. Attributes of person-type nodes

Vattr

_{p}

can be used to describe the key information of nodes of the person type in social networks and target networks, including personal information and personal assets, which is expressed as follows:

{Vattr}_{p} = 〈{Info}_{p}, {Prop}_{p}〉

(8)

Info

_{p}

is the description and expansion of the relevant information of the person node. Prop

_{p}

is a description of the available assets that a person contains and is expressed as Prop

_{p} = <

Info

_{s,}

, Cred

_{h} >

. Info

_{s}

refers to the social information pointing to other character nodes. Cred

_{h}

indicates the connection credential that character has that can log in to the target host. Combined with the above definition of person node attributes, the person node attribute information in the social network identified as Tom is expressed in Figure 4:

Definition 3.

Edges in NMPT

For the network graph model

G (t) = 〈 V, E, W, H 〉

, the set of edges

E = \{e_{1}, e_{2}, \dots, e_{k}\},

k \in N

, for each edge, according to the network security attributes related to penetration test is defined as 3-tuples:

e_{n} (t) = 〈 Eid, Etype, Ecap 〉

(9)

Eid is the identifier of the edge. Etype is the type of the edge. Ecap is capability of the edge.

(1): Edge identification

The identifier of the edge in the network is the globally unique identifier of an edge. Different from the Vid of a node, the Eid of an edge is often associated with the node.

(2): Edge type

As the node has two types of host node and character node, considering the different meanings of the edge between the host node and person node, the types of edge are divided into three types, which are expressed as follows:

EType = {HH, PP, PH}

(10)

HH represents the edge between hosts. PP represents the edge between persons. PH represents the edge between the person and the host.

Definition 4.

Connections in NMPT

For the network graph model

G (t) = 〈 V, E, W, H 〉

, its connection relation

W

is an important part of representing the reachability between nodes and the relationship between people. There are three connection relations of

W

defined in this paper, which are expressed as:

W = 〈W_{HH}, W_{PP}, W_{PH}〉

(11)

W_{HH}

is the connection relation between host nodes. It is expressed in the form of an adjacency matrix and denoted as

W_{HH} = {(w_{i j}^{\partial})}_{m \times m}

, where M is the number of host nodes, which are expressed as:

W_{HH} = [\begin{matrix} w_{11}^{\partial} & \dots & w_{1 m}^{\partial} \\ ⋮ & ⋱ & ⋮ \\ w_{m 1}^{\partial} & \dots & w_{m m}^{\partial} \end{matrix}]

(12)

For each element of

W_{HH}

,

w_{i j}^{\partial}

is described as:

w_{i j}^{\partial} = \{0, P_{ij}, 1\}

, where 0 represents the unconnectedness between hosts in the network, 1 represents the connectivity between hosts in the network. P

_{ij}

represents the firewall filtering policy included by hosts based on the network connectivity.

W_{PP}

represents the social relationship between character nodes, expressed in the form of the adjacency matrix, denoted as

W_{pp} = {(w_{i j}^{β})}_{n \times n}

, which is the number of character nodes, which is expressed as:

W_{PP} = [\begin{matrix} w_{11}^{β} & \dots & w_{1 n}^{β} \\ ⋮ & ⋱ & ⋮ \\ w_{n 1}^{β} & \dots & w_{n n}^{β} \end{matrix}]

(13)

Each element in

W_{PP}

is described as

w_{i j}^{β} = {0, 1}

, where the value 0 indicates that there is no social relationship between person nodes and 1 indicates that there is a social relationship between person nodes. Since the social relationships between persons are equivalent, that is to say, the social relationship between person A and person B is equal to the social relationship between person B and person A.

W_{PH}

represents the ownership relationship between person and hosts in the form of an adjacency matrix. It is denoted as

W_{PH} = {(w_{i j}^{γ})}_{m \times n}

, where m is the number of host nodes and n is the number of character nodes, which is expressed as:

W_{PH} = [\begin{matrix} w_{11}^{γ} & \dots & w_{1 n}^{γ} \\ ⋮ & ⋱ & ⋮ \\ w_{m 1}^{γ} & \dots & w_{m n}^{γ} \end{matrix}]

(14)

Each element in

W_{PH}

is described as

w_{i j}^{γ} = {0, 1}

, where the value 0 indicates that there is no ownership relationship between the person node and the host node, and 1 indicates that there is an ownership relationship between the person node and the host node. The relationship between the social network composed of person nodes and the network structure composed of host nodes is shown in Figure 5.

In Figure 5, each person node in the upper network might have the control authority or sensitive information of one or more hosts while forming a social network with each other. Host nodes in the underlying network not only constitute the network topology connection but also contain security-related attributes and assets. The dotted green line indicates the ownership relationship between the person and the host; that is, the person has the control rights of the host.

Definition 5.

Hierarchy in NMPT

The network hierarchy is used to represent the network area where the host nodes reside in the network topology. It is expressed as follows:

H = 〈{SN}_{max}, H_{max}, Π〉

(15)

SN

_{\max}

indicates the maximum number of subnets on the network. H

_{\max}

indicates the maximum number of hosts in a subnet. Π is the network structure concretization of all host nodes in the network model, which is expressed as:

Π = \{π_{1}, π_{2}, \dots, π_{m}\}, m \in N

(16)

For each

π_{m}

in Π:

π_{m} = 〈{Vid}_{H}, SN, H〉

(17)

Vid

_{H}

is the unique global identifier of a host node. SN is the number of a subnet in the network hierarchy. H is the number of a host node in a subnet.

3.1.2. The Role of NMPT in a Penetration Test

When a penetration tester uses NMPT to assist in a penetration test, the penetration attack actions and results issued by the penetration tester can be directly reflected in NMPT, helping the penetration tester and defender to better grasp the current network status changes. Figure 6 shows the process of a penetration tester and defender conducting a penetration test interacting with the network G(t) established through NMPT.

NMPT has two major advantages. First, the initialization generates a network G(t) containing elements such as host vulnerability information, host status information, running services, etc., which is an abstraction of the real network environment and covers the relevant security properties of the network in the real world. Secondly, during the penetration test, the penetration tester and the defender flexibly adjust the next attack and defense actions according to the feedback information on the network status.

3.2. Social Engineering Factor Extended Intelligent Penetration Test Model

Based on NMPT, the intelligent penetration test model with expanded social engineering factors (SE-AIPT) is constructed. Combined with RL, SE-AIPT associates the behavior of penetration testers with the target network state and incorporates the agent actions of social engineering.

3.2.1. Penetration Test Model

Definition 6.

Penetration test model

The penetration test model consists of NMPT, role, action, penetration test target and state observation space, which is expressed as:

{Model}_{PT} = 〈 G (t), Ro, A, T, O 〉

(18)

G(t) is a network graph model for penetration test, which serves as the target network in the process of the penetration test. Ro refers to the relevant role in the penetration test process. A is the action adopted during the penetration test. T is the target of the penetration test. O is the observation space.

(1): Role in the penetration test model

The role of the penetration test model is expressed as:

Role = {Attacker, Defender}

(19)

The attacker is a role who penetrates the target network during the penetration test, while the defender defends against the attacker who penetrates the target network. The attacker and defender directly affect the state of the target network through the attack and defense actions, respectively.

(2): Action in the penetration test model

Actions in the penetration test model are divided into attack actions and defense actions according to the roles in the penetration test model, which are expressed as:

A = \{A_{Attack}, A_{Defence}\}

(20)

There are four kinds of actions of the attacker, which are expressed as:

A_{Attack} = \{A_{local}, A_{remote}, A_{connect}, A_{social}\}

(21)

A

_{local}

is the local vulnerability exploitation action. A

_{remote}

represents the remote exploit action. A

_{connect}

represents the credentials connecting action. A

_{social}

represents social engineering action and is represented by a 2-tuple:

A_{social} = < {Vid}_{person}, SE

method >, where Vid

_{person}

is the ID of the person node. The social engineering method starts with human vulnerabilities and uses a range of deception and seduction methods to obtain critical sensitive data about infiltration targets. The parameter SEmethod represents the social engineering methods used, such as phishing attacks, forgery Man-in-the-MiddleAttack, and so on. By means of social work, the login information is obtained from the person node, the information of the host controlled by the person and a series of sensitive resources that are of great value to the attacker is discovered. The parameter list of the attacker action is shown in Table 1.

There are three types of defender actions, which are expressed as:

A_{Defence} = \{A_{patch}, A_{down}, A_{change}\}

(22)

A_{patch}

means to fix a specific vulnerability on a target network host.

A_{down}

indicates to shut down services on the host or power off the host to physically isolate the invaded node and prevent further horizontal movement and infiltration as a jumper.

A_{change}

represents adjusting firewall policies to deal with malicious access and filtering malicious traffic.

(3): Target in the penetration test model

The target T of the penetration test is set as obtaining sensitive resources specified in the target network, obtaining certain accumulative rewards or obtaining a certain proportion or all host control permissions in the target network.

(4): Observation in the penetration test model

The observation space O observes the state changes of the target network and feeds back the action processing results of the target network to the penetration tester. These responses include:

Whether the current penetration tester action is successful.
The network status changes caused by the success or failure of the action.
How does the current state change motivate both penetration tester and defender.
The penetration test target has been achieved or the penetration tester has given up the penetration detection on the target network.

To sum up, the observation space O is expressed as:

O = 〈 S, R, Down 〉

(23)

S is the status of the destination network. R is the excitation value of the penetration tester and defender due to the state change after the execution of the action. Down represents the completion state of the penetration test target.

3.2.2. Process Description of the Penetration Test Model

Assume that the state of the target network G(t) is S(t) at the current time t, and the Role applies an action A

_{t}

to a node in the target network. Meanwhile, the observation space of the target network G(t) generates a feedback value R(t+1), and the state of the target network G(t+1) changes from S(t) to S(t+1). The whole process is expressed as follows:

Role \frac{A_{t} ({Parameter}^{1}, \dots ., {Parameter}^{n})}{G (t)} O 〈 S (t + 1), R (t + 1), Down 〉

(24)

where S(t + 1) and R(t + 1) represent the state of the target network and the excitation value generated by the action response at the time of t + 1, respectively. Through establishing the penetration test model, the behavior of the penetration tester and defender is associated with the state of the target network. The penetration tester and defender of the action will affect the target network status. The response of the target network for action indirectly affects the penetration tester for the selection of the next step.

3.2.3. Intellectualization of the Penetration Test Model

Based on the construction of the penetration test model, to realize the intelligence of the penetration test process combined with RL, the penetration test process could be modeled as an MDP model, as shown in Figure 7.

The MDP model is denoted as a 4-tuple <S,A,R,T>. Where S stands for state space, A stands for action space, R stands for reward function. T stands for transfer function. The real-world penetration testing process is complex, covering the system state of hosts in the target network such as services, port openings and the impact of the penetration tester’s actions on the target network, and is fraught with uncertainty. It is not possible to describe the process of state transitions in terms of exact transfer functions and models. Therefore, in comparison to the MDP with a deterministic model, in the model-free MDP, the agent does not have direct access to the state transfer function and the reward function. However, each time the agent interacts with the environment, it collects a trajectory, and as a large number of trajectories are collected, the intelligence improves its penetration test strategy. We use the probability of success of an action issued by a penetration tester to describe the uncertainty in the penetration testing process. The success of the action execution directly affects the state of the environment, and the agent continuously and iteratively improves its own strategy by integrating the perception of the state change in the network and the feedback rewards from the environment. The corresponding relationship between the MDP element and penetration test model is shown in Figure 8.

The intelligent penetration test model with extended social worker factors is constructed based on MDP. RL could deal with the problem without assuming any given state and knowledge or model. An agent can acquire this knowledge through active learning and learn optimal problem-solving strategies in the balance of exploration and exploitation.

3.3. The Simulation Environment Construction of SE-AIPT

For the construction of an intelligent penetration test environment, the first step is to establish an environment that is able to interact with agents. At the same time, it has the function of RL and training interface and can provide a comparison of different algorithm performances. By simulating the penetration tester’s network exploit process in the real world and combining it with the RL algorithm in the field of artificial intelligence, we can achieve our goal. In summary, the intelligent penetration test environment mainly includes the following parts, and its structure is shown in Figure 9.

a.: Simulation model component

The simulation model constituting the training environment is responsible for the detailed definition of the basic components, including the network topology in the network scene, host, attribute, vulnerabilities, firewall, the host open ports, host identification, the agent’s action, etc. The definition of basic components should satisfy the constraints in the process of execution components between different components, such as the definition of access rules and flow should be considered when it comes to firewalls. In this case, there are specific constraints between firewall components and open ports of the host.

b.: Environment generation and registration

The penetration test simulation environment is constructed based on OpenAIGym, which is an open-source toolkit for developing and comparing RL algorithms, mainly supporting the Python language environment. The customization of the penetration test environment needs to meet the definition specifications of OpenAIGym. The self-established network scenes are formalized into OpenAIGym’s environment to construct standard interfaces and complete the registration of RL and finally generate the environment that can interact with agents.

c.: Artificial intelligence algorithm

OpenAIGym provides the API for algorithm comparison and allows users to compare the performance of the algorithm designed by themselves with the baseline. The realization part of the algorithm includes the representation of agent state space and action space.

To consider adding social engineering methods to the simulation environment, it is necessary to expand the actions of the agent in the model simulation component, and, at the same time, we should add the action processing feedback mechanism and the social network relationship of the person nodes in the environment generation module to improve the constraints between components. The steps to build an intelligent penetration test simulation environment for social engineering factor expansion are as follows.

For the components of the simulation model:

Step 1: Add social engineering action types to the definition of the agent action model.
Step 2: Define the exploitation results of social worker actions and the corresponding reward and punishment values. The exploitation results of social engineering actions include connection credential leakage and host node leakage.
Step 3: Add an independent undirected graph structure to represent the social network composed of person nodes, and each node in the undirected graph contains attributes and asset information of the person.

For the environment-generation module:

Step 1: Define global identifiers, global maximum number of actions for social engineering action types.
Step 2: Define the parameters of the social engineering action, the size of action space and the action bitmask that can be applied to the action processing in the environment.
Step 3: Define a bitmask validation method for environmental processing of social work actions to determine whether the currently performed social engineering action is within the range of space and whether the current target node satisfies the social engineering conditions.
Step 4: Generation of network scenario and registration of the environment.

4. Experiments

4.1. Experiment Environment and Process

To verify the feasibility and effectiveness of the SE-AIPT method, we constructed a simulation environment of an intelligent penetration test based on CyberBattleSim (CBS) developed by Microsoft. It is a process of penetration testing real-world networks for a high degree of abstraction to build network simulation scenarios. CBS provides users with a typical enterprise network and can train agents in network scenarios combined with RL algorithms. We integrate social engineering factors and generate network scenarios accordingly. Then, we select the chain structure network scenarios (CyberBattleChain) in CBS to conduct our series of experiments. The data models are highly abstract and still cannot be applied to real-world automated penetration testing. The data and information about vulnerabilities in the network scenario are publicly available, where social engineering data and vulnerabilities are defined and represented by abstraction, independent of the real social engineering database. The CyberBattleChain structure and the CyberBattleChain structure extended by social engineering factors are shown in Figure 10 and Figure 11, respectively.

CyberBattleChain consists of a Start node, a terminate node and a chain structure in between. The Start node is the starting point for the penetration tester to penetrate and represents the controlled initial node. The terminate node contains the flag of the penetration test. When the penetration tester obtains the flag through a series of means, the penetration test target is achieved, and the penetration process is terminated. The chain structure in the middle is composed of Linux hosts and Windows hosts alternately. The number of hosts can be expanded to achieve the purpose of expanding the scale of the experimental network scene. As seen in Figure 11, in the penetration test network scenario with extended social engineering factors, the penetration tester could not only penetrate the target host by means of vulnerability exploitation but also obtain the assets of the person in the social network by means of social engineering for further penetration tests.

4.2. Basic Settings for Using the RL Method

4.2.1. Agent State

For the agent state space definition, we focus on the global features in the target network that can characterize the current agent penetration testing process and state and the node-local features that reflect the agent’s action execution at a specific node. The composition of global state features includes the number of nodes currently discovered by the agent, the number of nodes that have been controlled, the number of ports discovered, the number of connection credentials discovered, etc. The local state characteristics of nodes include the number of exploit actions the agent has attempted on the node and the number of successful exploits. By connecting the global state feature and the node local feature, the agent’s observation of the target network state at the current moment is obtained, and the next action decision is made.

4.2.2. Agent Action

Agent action is a set of actions that an agent can take. In this experiment, the actions of the agent in the intelligent penetration testing environment extended by social engineering are divided into four types. It contains local exploit actions, remote exploit actions, connection exploit actions and social engineering exploits actions. The number of each type of executable action is determined by the number of vulnerabilities in the target network, the number of connection credentials and the number of social engineering actions that can be used.

4.2.3. Reward

The reward is the immediate feedback of the environment to the action performed by the agent, and it is an important basis for driving the interaction between the agent and the environment. In this experiment, the setting of rewards takes into account the penetration tester’s behavioral cost and the influence of action utilization in the process of penetration testing. In other words, the environmental reward value of the agent performing the current action is determined by the reward or punishment of action utilization result and the difference of the action utilization cost.

4.3. Baseline

In order to verify the effectiveness of SE-AIPT, we adopt three basic algorithms for comparative experiments. The fixed hyperparameters of the three methods are shown in Table 2.

Random: The selection of agent actions is not relevant to the agent state. An agent could randomly select actions to interact with the environment; the feedback and reward of the environment have no influence on the agent’s choice of the next action.
Tabular Q-learning: Tabular Q-learning is a value-based algorithm of RL algorithms. The main idea is to construct a table of states and actions to store Q values and then select the actions that can obtain the maximum benefits according to the Q values. However, for the RL tasks of high-dimensional state space and action space, the limited space of the table cannot store all the states and actions, which limits the performance of the algorithm.
DQL: DQL combines deep learning and reinforcement learning to effectively solve the large state-action storage space. DQL uses a neural network to replace the Q-value table in Tabular Q-learning. Thus, the original convergence problem of the action value function is transformed into the function-fitting problem of a neural network, which is a representative work in the field of DRL.

4.4. Experimental Settings and Results

4.4.1. Experiment A

DQL, Tabular Q-Learning and Random algorithms are used for training in the CyberBattlechain-10 scenario and CyberBattlechain-10 with Social Engineering scenario. The goal of the penetration test is to obtain the flag of the target node, and the average cumulative reward changed with the number of steps in each round of training is counted. The experimental parameters and training results are shown in Table 3 and Figure 12a, respectively. The specific statistical data of the experiment are shown in Table 4 and Table 5.

The agent is trained using three RL algorithms in CyberBattle-10 with social engineering (blue, yellow and green curves). SE-AIPT outperforms the methods without using social engineering to reach the maximum cumulative reward. In addition, excellent performance of DQL can be observed. The results show that in the environment with social engineering, the agent could obtain the connection credentials of the nodes at the end of the chain structure by using social engineering (as shown in the Figure 12a), and finally successfully control the target hosts and obtain the flag with fewer actions. SE-AIPT effectively reduces the length of the action sequence required by the agent to explore and exploit the target network.

According to the above experimental parameters, the performance of DQL in the intelligent penetration test is validated. In the CyberBattleChain with Social Engineering scenario and CyberBattlChain scenario, the DQL algorithm is applied to train the penetration tester agent under the conditions of a scene scale of 10 and 20, respectively. The target of the penetration test was set to reach a fixed accumulative reward value of 6000, where the average accumulative reward changes with the number of steps, as shown in Figure 12b. The results show that the number of steps required for DQL to reach the same average cumulative reward changes due to the change in scene size. With the expansion of test scenarios, the complexity of the environment increases, and the penetration tester agent needs to conduct more exploration and exploitation to learn a good strategy.

4.4.2. Experiment B

Based on Experiment A, to further test the stability of the three algorithms, the three agents were trained in the CyberBattleChain with Social Engineering scenarios using network sizes of 10 and 20. The relevant hyperparameters are shown in Table 6, and the results are shown in Figure 13. The specific statistical data of the experiment are shown in Table 7 and Table 8.

The results of Figure 13 show that DQL performs stably in two network scenarios of different scales, and both reach the maximum cumulative reward within 1000 steps. However, Tabular Q-Learning and Random algorithms increase the number of steps required to achieve the maximum accumulative reward from 600–800 to 3000–4000 when the network size is only doubled. The experimental results show that DQL has better robustness in the case of changing training scenarios.

4.4.3. Experiment C

To further test the transferability of DQL, DQL agents are trained in the CyberBattleChain with Social Engineering scenario and the CyberBattleChain scenario with a scale of 10. Then, the trained penetration tester agent is tested in a larger-scale chain network scenario, and the goal of the penetration test is set to obtain the target flag. The training hyperparameters are shown in Table 9, and the results are shown in Figure 14.

We trained the agent in a smaller-scale network scenario and applied the trained agent to a larger-scale network scenario for testing. The results of Figure 14a,b show that when the scale of the tested network scenarios is 50 and 80, regardless of whether the penetration tester agent has the social engineering action or not, the penetration test goal can be completed within a certain number of steps, and the maximum cumulative reward value can be achieved. However, when the scale of the tested network scenarios gradually expands to 100, as shown in Figure 14c, the maximum reward value cannot be achieved, and the penetration test goal cannot be completed within the specified number of steps. The portability of DQL algorithms is largely limited by the size of the test network scenario.

4.5. Result Analysis

We have verified the effectiveness of an intelligent penetration test environment extended by social engineering factors and explored the performance of different agents in variable-scale network scenarios. The results of experiments A and B show that in the scenarios with social engineering, all three agents could complete the penetration test goal with fewer steps and reach the maximum reward value. DQL shows better performance among the three algorithms and stability. Through analysis of the process of the penetration test, the penetration tester agent carries out social engineering actions of the people in the social network containing the key information of the host to obtain the connection credentials of the target host in another more effective way, and finally, achieve the goal of obtaining the flag with fewer action sequences. The penetration test simulation environment with extended social engineering factors reasonably models the behavior of non-traditional social engineering methods in the penetration test process. The SE-AIPT method we proposed has been proven to be universal and effective in improving the authenticity of simulated environment construction.

Based on experiments A and B, in order to further verify the transferability of the DQL algorithm, the results of experiment C show that when the test network scale expands to a certain order of magnitude, social engineering actions would no longer effectively improve the performance of penetration tester agents. As the network scale increases, the number of actions required to finish the target also continues to surge, and even the penetration test target cannot be achieved within the specified number of steps (as shown in Figure 14c). The transferability of the DQL algorithm and the scale of the scene are closely related.

5. Conclusions and Future Work

In this study, we construct an improved network graph model for penetration test (NMPT), integrating the factors and attributes related to network security. NMPT could better describe the penetration test process. Moreover, based on NMPT, we propose an intelligent penetration test simulation environment construction method incorporating social engineering factors, SE-AIPT.

The simulation environment construction method proposed in this paper could be used widely in the construction of an intelligent penetration test simulation environment. The experimental results show that the simulated environment of the intelligent penetration test constructed by the SE-AIPT method can enrich the actions of the penetration tester. The integration of social engineering actions provides a new way and possibility for the penetration tester agent to discover the threat path during the penetration test. At the same time, it effectively reduces the number of actions required to complete penetration and improves the efficiency of the penetration tester agent to exploit the target hosts. The simulation environment we constructed is more close to the real world. In addition, the SE-AIPT method shows superior results in different chain scenarios. Therefore, the SE-AIPT method has universality and expansibility.

The construction and application of the intelligent penetration test simulation environment is a research hotspot, so it is an important direction for future research on RL in the field of intelligent penetration tests to improve the reality of the penetration test simulation environment. What is more, from the perspective of the defender, the environment construction could be further improved by incorporating network deception and defense strategies.

Author Contributions

Conceptualization, Y.L., Y.W. and X.X.; Methodology, Y.L. and J.Z.; Software, Y.L. and Q.Y.; Validation, Y.L. and X.X.; Formal analysis, Y.L. and Y.W.; Resources, Y.W., X.X. and J.Z.; Writing—original draft preparation, Y.L. and Q.Y.; Writing—review and editing, Y.L., Y.W. and X.X.; Visualization, Y.L.; Supervision, Y.W., X.X. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to sincerely thank the reviewers for their valuable comments and suggestions. The authors would also like to thank the developers in the Github community for their positive responses, which have helped us a lot.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chowdhary, A.; Huang, D.; Mahendran, J.S.; Romo, D.; Deng, Y.; Sabur, A. Autonomous security analysis and penetration testing. In Proceedings of the 2020 16th International Conference on Mobility, Sensing and Networking (MSN), Tokyo, Japan, 17–19 December 2020; pp. 508–515. [Google Scholar]
Yichao, Z.; Tianyang, Z.; Junhu, Z.; Qingxian, W. Domain-independent intelligent planning technology and its application to automated penetration testing oriented attack path discovery. J. Electron. Inf. Technol. 2020, 42, 2095–2107. [Google Scholar]
Baiardi, F. Avoiding the weaknesses of a penetration test. Comput. Fraud Secur. 2019, 2019, 11–15. [Google Scholar] [CrossRef]
Polatidis, N.; Pimenidis, E.; Pavlidis, M.; Papastergiou, S.; Mouratidis, H. From product recommendation to cyber-attack prediction: Generating attack graphs and predicting future attacks. Evol. Syst. 2020, 11, 479–490. [Google Scholar] [CrossRef] [Green Version]
Walter, E.; Ferguson-Walter, K.; Ridley, A. Incorporating Deception into CyberBattleSim for Autonomous Defense. arXiv 2021, arXiv:2108.13980. [Google Scholar]
Schwartz, J. Network Attack Simulator. 2017. Available online: https://github.com/Jjschwartz/NetworkAttackSimulator (accessed on 16 May 2022).
Baillie, C.; Standen, M.; Schwartz, J.; Docking, M.; Bowman, D.; Kim, J. Cyborg: An autonomous cyber operations research gym. arXiv 2020, arXiv:2002.10667. [Google Scholar]
Team, M.D. CyberBattleSim. 2021. Available online: https://github.com/microsoft/cyberbattlesim (accessed on 16 May 2022).
Li, L.; Fayad, R.; Taylor, A. CyGIL: A Cyber Gym for Training Autonomous Agents over Emulated Network Systems. arXiv 2021, arXiv:2109.03331. [Google Scholar]
Salahdine, F.; Kaabouch, N. Social engineering attacks: A survey. Future Internet 2019, 11, 89. [Google Scholar] [CrossRef] [Green Version]
Krombholz, K.; Hobel, H.; Huber, M.; Weippl, E. Advanced social engineering attacks. J. Inf. Secur. Appl. 2015, 22, 113–122. [Google Scholar] [CrossRef]
Shah, S.; Mehtre, B.M. An overview of vulnerability assessment and penetration testing techniques. J. Comput. Virol. Hacking Tech. 2015, 11, 27–49. [Google Scholar] [CrossRef]
Yaqoob, I.; Hussain, S.A.; Mamoon, S.; Naseer, N.; Akram, J.; ur Rehman, A. Penetration testing and vulnerability assessment. J. Netw. Commun. Emerg. Technol. 2017, 7, 10–18. [Google Scholar]
Chu, G.; Lisitsa, A. Ontology-based Automation of Penetration Testing. In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), Valletta, Malta, 25–27 February 2020; pp. 713–720. [Google Scholar]
Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. arXiv 2018, arXiv:1811.12560. [Google Scholar]
Hu, Z.; Beuran, R.; Tan, Y. Automated penetration testing using deep reinforcement learning. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), All-Digital, Genoa, Italy, 7–11 September 2020; pp. 2–10. [Google Scholar]
Applebaum, A.; Miller, D.; Strom, B.; Korban, C.; Wolf, R. Intelligent, automated red team emulation. In Proceedings of the 32nd Annual Conference on Computer Security Applications, Los Angeles, CA, USA, 5–9 December 2016; pp. 363–373. [Google Scholar]
Schwartz, J.; Kurniawati, H.; El-Mahassni, E. Pomdp+ information-decay: Incorporating defender’s behaviour in autonomous penetration testing. In Proceedings of the International Conference on Automated Planning and Scheduling, Nancy, France, 14–19 June 2020; Volume 30, pp. 235–243. [Google Scholar]
Durkota, K.; Lisỳ, V. Computing Optimal Policies for Attack Graphs with Action Failures and Costs. In Proceedings of the 7th European Starting AI Researcher Symposium, Prague, Czech Republic, 18–22 August 2014; pp. 101–110. [Google Scholar]
Hoffmann, J. Simulated Penetration Testing: From “Dijkstra” to “Turing Test++”. In Proceedings of the International Conference on Automated Planning and Scheduling, Jerusalem, Israel, 7–11 June 2015; Volume 25, pp. 364–372. [Google Scholar]
Schwartz, J.; Kurniawati, H. Autonomous penetration testing using reinforcement learning. arXiv 2019, arXiv:1905.05965. [Google Scholar]
Zennaro, F.M.; Erdodi, L. Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning. arXiv 2020, arXiv:2005.12632. [Google Scholar]
Hatfield, J.M. Virtuous human hacking: The ethics of social engineering in penetration-testing. Comput. Secur. 2019, 83, 354–366. [Google Scholar] [CrossRef]
Aldawood, H.A.; Skinner, G. A critical appraisal of contemporary cyber security social engineering solutions: Measures, policies, tools and applications. In Proceedings of the 2018 26th International Conference on Systems Engineering (ICSEng), Sydney, NSW, Australia, 18–20 December 2018; pp. 1–6. [Google Scholar]
Stiawan, D. Cyber-attack penetration test and vulnerability analysis. Int. J. Online Biomed. Eng. 2017, 13, 125–132. [Google Scholar] [CrossRef] [Green Version]
Xiong, X.; Guo, L.; Zhang, Y.; Zhang, J. Cyber-Security Enhanced Network Meta-Model and its Application. In Proceedings of the 6th International Conference on Cryptography Security and Privacy, Tianjin, China, 14–16 January 2022. [Google Scholar]

Figure 1. Penetration test phases—effort and time distribution.

Figure 2. Comparison of manual penetration test with AI-driven penetration test.

Figure 3. Setting of host attributes according to the definition.

Figure 4. Setting of person attributes according to the definition.

Figure 5. Relationship between social network and target network topology.

Figure 6. The interaction between the penetration tester and network G(t).

Figure 7. The penetration test process is modeled as MDP.

Figure 8. Relationship between MDP and penetration test model.

Figure 9. Composition of intelligent penetration test simulation environment.

Figure 10. CyberBattleChain Experimental Scenario.

Figure 11. CyberBattleChain Experimental Scenario with Social Engineering Factors.

Figure 12. The result of experiment A. (a) The average cumulative rewards of the three agents in different training scenarios. (b) The average cumulative reward for applying DQL in network scenarios of different scales.

Figure 13. The result of experiment B. (a) Average cumulative reward variation for three agents trained in the CyberBattleChain-10 scenario. (b) Average cumulative reward variation for three agents trained in the CyberBattleChain-20 scenario.

Figure 14. The result of experiment C. (a) Variation of the average cumulative reward under the CyberBattleChain-50 test scenario. (b) Variation of the average cumulative reward under the CyberBattleChain-80 test scenario. (c) Variation of the average cumulative reward under the CyberBattleChain-100 test scenario.

Table 1. The parameter of the attacker action.

Action	SourceID	TargetID	Vulnerability or Means	Additional Parameters
Local	✓	✗	✓	✗
Remote	✓	✓	✓	✗
Connect	✓	✓	✓	✓
Social Engineering	✗	✓	✓	✗

Table 2. Fixed hyperparameters of the three algorithms.

Hyperparameters	DQL	Tabular Q-Learning	Random
Batch Size	32	*	*
Learning Rate	0.01	0.01	*
Epsilon	0.9	0.9	*
Discount Factor	0.015	0.015	*
Replay Memory Size	10,000	*	*
Target network update frequency	10	*	*

* indicate that there are no values.

Table 3. Hyperparameter setting of Experiment A.

Hyperparameters	DQL	Tabular Q-Learning	Random
Max steps per episode	6000	6000	6000
Episode	150	150	50

Table 4. Specific values of experimental results in Figure 12a.

Algorithm	Scenario	Goal	Number of Step
DQL	CyberBattleChain with Social Engineering	Cat The Flag	343
Tabular Q-learning	CyberBattleChain with Social Engineering	Cat The Flag	821
Random	CyberBattleChain with Social Engineering	Cat The Flag	1337
DQL	CyberBattleChain	Cat The Flag	1025
Tabular Q-learning	CyberBattleChain	Cat The Flag	1232
Random	CyberBattleChain	Cat The Flag	1145

Table 5. Specific values of experimental results in Figure 12b.

Algorithm	Scenario	Goal	Number of Step
DQL	CyberBattleChain-10 with Social Engineering	Cat The Flag	187
DQL	CyberBattleChain-10	Cat The Flag	623
DQL	CyberBattleChain-20 with Social Engineering	Cat The Flag	778
DQL	CyberBattleChain-20	Cat The Flag	1121

Table 6. Hyperparameter settings of Experiment B.

Hyperparameters	DQL	Tabular Q-Learning	Random
Max steps per episode	6000	6000	6000
Episode	150	150	50

Table 7. Specific values of experimental results in Figure 13a.

Algorithm	Scenario	Goal	Number of Step
DQL	CyberBattleChain-10 with Social Engineering	Fixed reward value	210
Tabular Q-learning	CyberBattleChain-10 with Social Engineering	Fixed reward value	670
Random	CyberBattleChain-10 with Social Engineering	Fixed reward value	798

Table 8. Specific values of experimental results in Figure 13b.

Algorithm	Scenario	Goal	Number of Step
DQL	CyberBattleChain-20 with Social Engineering	Fixed reward value	1210
Tabular Q-learning	CyberBattleChain-20 with Social Engineering	Fixed reward value	3278
Random	CyberBattleChain-20 with Social Engineering	Fixed reward value	3488

Table 9. Hyperparameter setting of Experiment C.

Hyperparameters	DQL	Tabular Q-Learning	Random
Max steps per episode	7000	9000	13,000
Episode	50	50	50

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Wang, Y.; Xiong, X.; Zhang, J.; Yao, Q. An Intelligent Penetration Test Simulation Environment Construction Method Incorporating Social Engineering Factors. Appl. Sci. 2022, 12, 6186. https://doi.org/10.3390/app12126186

AMA Style

Li Y, Wang Y, Xiong X, Zhang J, Yao Q. An Intelligent Penetration Test Simulation Environment Construction Method Incorporating Social Engineering Factors. Applied Sciences. 2022; 12(12):6186. https://doi.org/10.3390/app12126186

Chicago/Turabian Style

Li, Yang, Yongjie Wang, Xinli Xiong, Jingye Zhang, and Qian Yao. 2022. "An Intelligent Penetration Test Simulation Environment Construction Method Incorporating Social Engineering Factors" Applied Sciences 12, no. 12: 6186. https://doi.org/10.3390/app12126186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intelligent Penetration Test Simulation Environment Construction Method Incorporating Social Engineering Factors

Abstract

1. Introduction

2. Background

2.1. Manual Penetration Test and AI-Driven Penetration Test

2.2. Intelligent Penetration Test Simulation Environment

3. Methods

3.1. Network Graph Model for Penetration Test

3.1.1. Definition of the Model

3.1.2. The Role of NMPT in a Penetration Test

3.2. Social Engineering Factor Extended Intelligent Penetration Test Model

3.2.1. Penetration Test Model

3.2.2. Process Description of the Penetration Test Model

3.2.3. Intellectualization of the Penetration Test Model

3.3. The Simulation Environment Construction of SE-AIPT

4. Experiments

4.1. Experiment Environment and Process

4.2. Basic Settings for Using the RL Method

4.2.1. Agent State

4.2.2. Agent Action

4.2.3. Reward

4.3. Baseline

4.4. Experimental Settings and Results

4.4.1. Experiment A

4.4.2. Experiment B

4.4.3. Experiment C

4.5. Result Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI