AI-Based Resource Allocation in E2E Network Slicing with Both Public and Non-Public Slices

Wang, Yuxing; Liu, Nan; Pan, Zhiwen; You, Xiaohu

doi:10.3390/app132212505

Open AccessArticle

AI-Based Resource Allocation in E2E Network Slicing with Both Public and Non-Public Slices

¹

National Mobile Communications Research Laboratory, Southeast University, Nanjing 211100, China

²

Purple Mountain Laboratories, Nanjing 211100, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(22), 12505; https://doi.org/10.3390/app132212505

Submission received: 27 September 2023 / Revised: 6 November 2023 / Accepted: 13 November 2023 / Published: 20 November 2023

(This article belongs to the Special Issue Application of Reinforcement Learning in Wireless Network)

Download

Browse Figures

Versions Notes

Abstract

:

Network slicing is a key technology for 5G networks, which divides the traditional physical network into multiple independent logical networks to meet the diverse requirements of end-users. This paper focuses on the resource allocation problem in the scenario where public and non-public network slices coexist. There are two kinds of resources to be allocated: one is the resource blocks (RBs) allocated to the users in the radio access network, and the other is the server resources in the core network. We first formulate the above resource allocation problem as a nonlinear integer programming problem by maximizing the operator profit as the objective function. Then, a combination of deep reinforcement learning (DRL) and machine learning (ML) algorithms are used to solve this problem. DRL, more specifically, independent proximal policy optimization (IPPO), is employed to provide the RB allocation scheme that makes the objective function as large as possible. ML, more specifically, random forest (RF), assists DRL agents in receiving fast reward feedback by determining whether the allocation scheme is feasible. The simulation results show that the IPPO-RF algorithm has good performance, i.e., not only are all the constraints satisfied, but the requirements of the non-public network slices are ensured.

Keywords:

network slicing; resource allocation; non-public network; deep reinforcement learning; machine learning

1. Introduction

With the development of fifth-generation (5G) technology and the emergence of diverse services, a wide variety of quality of service (QoS) demands are placed on existing communication networks [1]. In order to meet the different needs of terminal devices using existing communication networks, network slicing (NS) technology has emerged. NS technology divides the traditional physical network into multiple dedicated, virtualized, customized, and isolated logical networks, where each slice is used to fulfill the requirement of a specific service [2]. The current diverse scenarios for NS services are mainly categorized into enhanced mobile broadband (eMBB), ultra-reliable, low-latency communication (URLLC) and massive machine-type communication (mMTC) services [3]. Different scenarios have different requirements for network resources, which determines the differentiated services they can provided. The eMBB services are mainly suitable for high data rate application scenarios, such as ultra-high-definition video transmission, virtual reality, live streaming of races, etc. The URLLC services are mainly targeted at low-latency and high-reliability application scenarios, such as autonomous driving, 5G smart factories, remote surgery, etc. The mMTC service are mainly suitable for scenarios with a high number of terminal connections, such as smart home, automatic meter reading, smart agriculture, etc.

NS uses network function virtualization (NFV) and software-defined networking (SDN) technology to manage and allocate resources to achieve logical unified management and flexible control. As a result, the use of the above technology has resulted in a significant increase in resource utilization [4]. More specifically, NFV abstracts general servers as computing resources, storage resources, and network resources in the network, manages them uniformly, and then divides them on demand. Using NFV technology, previously dedicated communication network equipment can be replaced with general high-performance servers, and communication network functions are implemented via virtual network functions (VNFs) running on virtual machines (VMs) [5]. SDN enables the separation of the control plane and the data plane, making the network much more flexible and scalable.

NS has many features such as end-to-end (E2E), isolation, flexibility, customization, and so on. First, the E2E concept indicates that the slice is an E2E network with the ability to provide network services independently. Therefore, the resource allocation of the radio access network (RAN) and the core network (CN) should be considered together. Isolation means that any changes in one slice (e.g., addition of users, changes in channel state information, etc.) will not affect the operation of other slices. Flexibility is reflected in the fact that the network elements that make up the slices can be deployed independently or shared for scheduling. Customization indicates that slicing provides on-demand logical network topology and connectivity for different industries, services, or users to meet their differentiated QoS.

3GPP states that networks can be classified into two main categories: Public Land Mobile Network (PLMN) and non-public network (NPN) [6]. PLMN is a communication network built by mobile network operators (MNOs) for public users, which is most closely related to our daily life. NPN achieves network signal coverage in specific areas. It provides communication services within the enterprise according to the characteristics of the industry. It can be understood as a supplement to the public network. NPN requires high reliability as well as high security. High reliability is reflected in the priority to ensure that the QoS of NPN is not affected when the resources are relatively insufficient. High security is demonstrated by the fact that the resources used by NPN are relatively independent, which ensures that private data within the enterprise is not exposed. NPN can be deployed in two ways [7]: (i) Standalone NPN (S-NPN), where enterprises use a dedicated band and deploy a full 5G network independently. Since it is completely isolated from the PLMN, it can effectively protect the enterprise’s data security. The disadvantage is that it is very costly and not suitable for most enterprises. (ii) Public network integrated NPN (PNI-NPN), where the NPN shares some infrastructure with the PLMN to accomplish the network’s deployment. NPNs are usually implemented by creating dedicated network slices [8]. Hence, the main issue considered in this paper is the resource allocation in the scenario of the integration of PLMN and NPN, which are both achieved via NS.

The resource allocation problem of the NS mainly includes two parts: the resource allocation of RAN and the VNF mapping of CN. For the resource allocation of RAN, ref. [9] focuses on efficient sharing of radio resources among different users while guaranteeing the service requirements. Although this approach usually results in higher multiplexing gains, it does not guarantee performance isolation. A two-level RAN RB resource allocation method is proposed in [10]. It is assumed that when the pre-allocated RB resources of a base station are insufficient, it is allowed to borrow additional RB directly from other base stations. It has the advantage of reducing the signaling overhead and providing an immediate service response to end users. But, the isolation between slices is also not considered. Ref. [11] proposed a resource scheduling scheme that consists of deep learning (DL) in conjunction with reinforcement learning (RL). The main goal is to satisfy the users’ QoS requirements with as few resources as possible while satisfying the isolation requirement. The advantage is that DL and RL are integrated together and the isolation of slices is taken into account. The disadvantage is that it only considers the allocation of resources to slices, but it does not continue to allocate them to end users. In [12], a two-layer resource allocation framework is proposed, where the upper and lower layers correspond to the Centralized Unit or Distributed Unit (CU/DU) side and the Remote Radio Unit (RRU) side, respectively. After that, a Q-learning-based NS resource allocation scheme is proposed to maximize the total utility of the whole network. Ref. [13] mainly addresses the problem of bandwidth allocation for deep reinforcement learning-based NS. It enables the weighted sum of spectral efficiency and quality of experience (QoE) to be maximized. For the VNF mapping of CN, ref. [14] proposed a scheme based on the feature decomposition method to implement VNF mapping and traffic control. However, only simple graph mapping relationships are considered, which do not involve the processing of dynamic services and cannot meet the real-time requirements of services. Ref. [15] presents the VNF placement problem in NS, where the optimization problem is to minimize the reliability degradation and cost while maintaining strict delay constraints. A Tabu search-based algorithm is used to find a solution. In [16], a VNF deployment scheme that maximizes operational benefits is proposed to ensure the reliability of the network. Ref. [17] proposed a graph neural network-based algorithm to predict the resource demand by using the topological information of the VNF forwarding graph, thus reducing the failure rate of slices.

However, the research above on the resource allocation for NS does not address NPN slices. In other words, only the QoS requirements of different slices are considered, and the differences between public and non-public network slices are not taken into account. Therefore, in this paper, we focus on the resource allocation problem in the scenario where public network slices (including eMBB slices and URLLC slices) and non-public network slices (NPN slices) coexist. Compared with public network slices, the characteristics of NPN slices are reflected in two aspects: (i) NPN slices require higher service priority and reliability; and (ii) NPN slices require higher security and privacy. In addition, we summarize the related works and compare them with our work in Table 1.

In this paper, we first formulate the E2E resource allocation problem with both public and non-public slices, which is a nonlinear integer programming problem. Next, due to the nature of the problem, we split the joint optimization problem into two sub-problems: the RB allocation problem and the feasibility problem of VNF mapping. Then, a combination of deep reinforcement learning (DRL) and machine learning (ML) algorithms are used to solve the problem. More specifically, DRL is employed to provide the RB allocation scheme that makes the objective function, which is the network operator’s profit, as large as possible. ML assists in determining whether the allocation scheme provided by DRL satisfies the constraints, thus helping the DRL agents to receive fast reward feedback. For deep reinforcement learning, we use multi-agent reinforcement learning to reduce the action space, and the specific algorithm employed is independent proximal policy optimization (IPPO). For machine learning, we speed up the solution feasibility determination by training a binary classifier, and the specific algorithm used is random forest (RF). Therefore, we name the proposed algorithm the IPPO-RF algorithm. The simulation results show that the IPPO-RF algorithm has good performance, i.e., not only are all the constraints satisfied, but the service priority and security of the NPN slices are ensured. The main contributions of this paper can be summarized as follows.

We solved the resource allocation problem associated with E2E network slicing, which involves both the RB allocation problem in the RAN and the VNF mapping problem in the CN.
We focus on the resource allocation problem in the scenario where public network slices (including eMBB slices and URLLC slices) and non-public network slices (NPN slices) coexist. Compared with public network slices, the characteristics of NPN slices require higher reliability and higher privacy.
We innovatively utilize a collaborative approach, combining DRL and ML, to address the E2E resource allocation problem of network slicing. The significance of the algorithm above lies in its contribution of a valuable approach for solving similar problems.

The rest of the paper is organized as follows. The system model is presented in Section 2. In Section 3, we first describe how to split the joint optimization problem of this paper into two sub-problems. Next, a combination of DRL and ML algorithms are introduced to solve the problem. In Section 4, we present the simulation results as well as discussions, and finally, we conclude the paper in Section 5.

2. System Model

In this paper, we consider an end-to-end (E2E) NS system where public and non-public network slices coexist. In terms of public network slicing, since most of the researchers primarily focus on eMBB and URLLC slices, we take them into account. Thus, a total of three types of slices are considered: eMBB slices, URLLC slices, and NPN slices. The system architecture is shown in Figure 1.

The system consists of two parts: the radio access network (RAN) and the core network (CN). These two parts are connected via the backhaul links. The RAN is mainly responsible for allocating wireless spectrum resource blocks (RBs) to users in the slice. The CN mainly maps the VNFs to general servers in the physical network.

2.1. Radio Access Network Domain

We assume a specific area composed of a set of gNodeBs

B = {1, 2, \dots, B}

. The three types of slices considered, namely the eMBB slices, the URLLC slices, and NPN slices, are denoted by

S_{e}, S_{u}

, and

S_{n p}

, respectively. The set of slices is represented as

S = {1, 2, \dots, S} = S_{e} \cup S_{u} \cup S_{n p}

. Randomly located in the area are the end-users

U = {1, 2, \dots, U}

, which are eMBB users, URLLC users, and NPN users, i.e., each user belongs to one and only one slice. The set of users belonging to Slice s can be represented as

U_{s}

,

s \in S

. We assume that each end-user

u \in U

is served by the nearest gNodeB, denoted as

b (u)

. Considering the limited coverage of the NPN, we assumes that the NPN slice is served by only one gNodeB. We consider a downlink (DL) transmission scenario based on orthogonal frequency division multiple access (OFDMA). The spectrum resources can be represented as a shared RB pool

K = {1, 2, \dots, K}

, where the RB are organized as a resource grid and each RB is the minimum scheduling unit [18]. we define a binary variable

x_{s, u}^{k}

as an indicator for assigning RB k to user u in slice s:

x_{s, u}^{k} = \{\begin{matrix} 1, & If k \in K is assigned to u \in U_{s} \\ 0, & Otherwise \end{matrix}

(1)

We assume that each RB can be assigned to at most one user, thus reducing interference and ensuring the isolation of slices. This constraint can be expressed as follows:

\sum_{s \in S} \sum_{u \in U_{s}} x_{s, u}^{k} \leq 1 \forall k \in K

(2)

We further assume that the gNodeBs have a near perfect knowledge of the channel state information (CSI). The achievable data rate of the user u belonging to slice s is defined as follows:

r_{u}^{s} = \sum_{k \in K} x_{s, u}^{k} W {log}_{2} (1 + \frac{P_{d} G_{s, u}}{σ^{2}})

(3)

where W represents the bandwidth of an RB;

P_{d}

is the downlink transmission power; and

G_{s, u}

is the channel gain between the gNodeB

b (u)

and the end-user

u \in U_{s}

, including Pathloss and Rayleigh fading [19].

σ^{2}

is the power of the additive white Gaussian noise (AWGN). We assume that the downlink transmission power is the same for all RBs. We assume that the set of gNodeBs we consider is a gNodeB sharing group, more specifically, all the gNodeBs share a common RB pool [20]. Therefore, there is no frequency reuse in the same gNodeB sharing group. Furthermore, we have made the assumption that each RB can be assigned to at most one user. Thus we do not consider interference in Equation (3).

Therefore, the total rate of slice s can be expressed as:

r^{s} = \sum_{u \in U_{s}} r_{u}^{s} \forall s \in S

(4)

To calculate the RAN delay of user u in slice s, we follow [10,21,22], where the queue traffic model is formulated as an M/M/1 queuing model. We make the following assumptions: (i) the arrival process of each user packets follows a Poisson distribution, and (ii) the inter-arrival times of the packets are independent and follow an exponential distribution with mean

1 / λ_{s, u}

. Thus, the packets arriving and serving process in slice s can be modeled as

| U_{s} |

independent M/M/1 queuing systems. By applying Little’s law, the RAN delay of the user u in slice s is:

d_{RAN}^{s, u} = \frac{1}{r_{u}^{s} / ω_{s, u} - λ_{s, u}}

(5)

where

ω_{s, u}

and

λ_{s, u}

denote the packet size and the packet arrival rate of user u in slice s, respectively. The first and second terms in the denominator represent the processing and arrival rates of user packets, respectively. We assume that users within the slice have the same packet size and packet arrival rate. The packet size for each slice is denoted by

ω_{e}, ω_{u}, ω_{n p}

, respectively. The packet arrival rate for each slice is denoted by

λ_{e}, λ_{u}, λ_{n p}

, respectively.

2.2. Core Network Domain

We consider the physical network as a directed graph

G = (V, E)

, where

V = {1, 2, \dots,

V}

represents the set of server nodes, and

E = {(1, 1), (1, 2), \dots, E}

represents the set of links between server nodes. The resources of each server usually include CPU, memory, disk, etc. In this paper, in order to simplify the model, various resources on nodes are unified as computing resources. The available computing resources of all server nodes are denoted as

O = {o_{1}, o_{2}, \dots, o_{V}}

. The number of virtual machines (VMs) that the server nodes host is represented as

M = {m_{1}, m_{2}, \dots, m_{V}}

. The set of link bandwidths between the server nodes is represented as

L = {l_{1, 1}, l_{1, 2}, \dots, l_{v, v^{'}}, \dots, l_{V, V}}

, and the set of link delays between server nodes is represented as

D_{E} = {d_{1, 1}, d_{1, 2}, \dots, d_{v, v^{'}}, \dots, d_{V, V}}

. Note that if there is no directed edge from Server node a to Server node b, then the link bandwidth

l_{a, b} = 0

and the link delay

d_{a, b} = \infty

. The link bandwidth from a node to itself is specified to be infinity, i.e.,

l_{a, a} = \infty

, while the link delay is zero, i.e.,

d_{a, a} = 0

.

VNF is a type of network function implemented in the software and it operates on physical servers. Different VNFs can be arranged in a specific sequence to create a service function chain (SFC), enabling the provision of complex network services. The VNFs we are considering primarily include a firewall (FW), network address translator (NAT), traffic monitor (TM), video optimization controller (VOC), an intrusion detection and prevention system (IDPS), and so on [19]. Different VNFs types are needed, which is represented as

F = {1, 2, \dots, F}

. For each slice, its SFC consists of several VNFs in a certain order. Thus, the SFC of slice

s \in S

is denoted by

F^{s} = {f_{1}^{s}, f_{2}^{s}, \dots, f_{t_{s}}^{s}}

, where

f_{j}^{s} \in F

,

j \in {1, \dots, t_{s}}

, and

t_{s}

is the length of the SFC of Slice s. Each VM on any server node can host at most one VNF, and multiple VNFs in the same SFC can be deployed in the same server node. The processing delay of each VNF is determined by its type. To simplify the model, we assume that the delay of each VNF type is known in advance and it can be represented as

D_{V} = {d_{1}, d_{2}, \dots, d_{F}}

.

When the data flow passes through the SFC of a slice, after the processing of each VNF, the data processing rate in the SFC will change. In order to easily depict the variation in the data flow in the SFC, we assume that the data processing rate through different VNFs follow linear changes, and this change is quantified by using the data rate expansion ratio [12]. Note that the data rate expansion ratio depends on the type of the VNF; hence, we collect the data rate expansion ratio of all VNF types and denote it by

β = {β_{1}, β_{2}, \dots, β_{F}}

. Thus, the data processing rate through the j-th VNF in the SFC of Slice s can be denoted by [12]:

R_{j + 1}^{s} = R_{j}^{s} β_{f_{j}^{s}}

(6)

where

R_{j}^{s}

is the input data processing rate of the j-th VNF and correspondingly,

R_{j + 1}^{s}

is the output data processing rate of the j-th VNF, which is also the input data processing rate of the

j + 1

-th VNF. Note that the data processing rate of the first VNF in the SFC of Slice s is the total rate of slice,

R_{1}^{s} = r^{s}

, where

r^{s}

is given in (4). Figure 2 illustrates an example of the data processing rate in the SFC of Slice s:

A binary variable

η_{j, v}^{s}

is defined as an indicator to show the mapping of the VNFs in the SFC to server nodes:

η_{j, v}^{s} = \{\begin{matrix} 1, & if the j - th VNF in the SFC of Slice s is deployed on Server node v \\ 0, & Otherwise \end{matrix}

(7)

The following constraint ensures that each VNF must be successfully deployed and be deployed in only one server node.

\sum_{v \in V} η_{j, v}^{s} = 1

(8)

To simplify the complexity of the model, we assume the computing resources required by the VNFs to be linear with the data rate passing through the VNFs [12]. Therefore, the relationship between the VNF computing resources required and the data processing rate is expressed as follows:

c_{j}^{s} = α R_{j}^{s}

(9)

where

α

represents the correlation coefficient between the computing resources and the data rate.

The CN propagation delay of Slice s is defined as the sum of the delays of each link and it can be expressed as

d_{prop, CN}^{s} = \sum_{j = 1}^{t_{s} - 1} \sum_{(v, v^{'}) \in E} η_{j, v}^{s} η_{j + 1, v^{'}}^{s} d_{v, v^{'}}

(10)

The CN processing delay for Slice s is defined as the sum of the processing delays of each VNF as follows:

d_{proc, CN}^{s} = \sum_{j = 1}^{t_{s}} d_{f_{j}^{s}}

(11)

Hence, the total CN delay of User u in Slice s is given as

d_{CN}^{s, u} = d_{prop, CN}^{s} + d_{proc, CN}^{s}

(12)

which is only a function of s and not u, i.e., for users belonging to the same slice, the CN delay is the same.

Thus, the total delay of user u in Slice s is

d_{s, u}^{total} = d_{RAN}^{s, u} + d_{CN}^{s, u}

(13)

where

d_{RAN}^{s, u}

is given by (5).

In order to reflect the higher security and privacy of NPN slices compared to public network slices, we require VNFs of the NPN slices to have dedicated server nodes and links. More specifically, to ensure the security and privacy of the NPN slices, we require that if a server node v is occupied by a VNF of an NPN slice

s^{'}

, then VNFs of other slices cannot be deployed on this server node, i.e.,

\sum_{j = 1}^{t_{s^{'}}} η_{j, v}^{s^{'}} \sum_{s \in S, s \neq s^{'}} \sum_{j = 1}^{t_{s}} η_{j, v}^{s} = 0 \forall v \in V, \forall s^{'} \in S_{n p}

(14)

Also, for security reasons, if a link is used by an NPN slice

s^{'}

, then other slices cannot use this link, i.e.,

\begin{matrix} \sum_{j = 1}^{t_{s^{'}} - 1} η_{j, v}^{s^{'}} η_{j + 1, v^{'}}^{s^{'}} \sum_{s \in S, s \neq s^{'}} \sum_{j = 1}^{t_{s} - 1} η_{j, v}^{s} η_{j + 1, v^{'}}^{s} = 0 \forall l_{v, v^{'}} \in L, v \neq v^{'}, \forall s^{'} \in S_{n p} \end{matrix}

(15)

2.3. Performance Metrics

In order to evaluate the performance of the resource allocation, we introduce the performance metrics of user satisfaction and the network operator’s profit.

In order to reflect the difference of users’ demand in the slice, we propose the personalized requirements of users (PRU). For the eMBB slices, users have individual rate requirements; for the URLLC slices, users have individual delay requirements; and for the NPN slices, users have both individual rate and delay requirements. It is important to note that the user’s personalized requirements simply reflect the rate or delay that the user anticipates, which may not be strictly guaranteed during the resource allocation. The set of PRU for these three types of slices are denoted by

Q_{e} = {q_{s, u}^{v}, u

is an eMBB user},

Q_{u} = {q_{s, u}^{d}, u

is an URLLC user}, and

Q_{n p} = {(q_{s, u}^{v}, q_{s, u}^{d}), u

is a NPN user}, respectively. The set of PRU can be represented as

Q = {Q_{e}, Q_{u}, Q_{n p}}

. Ideally, we would like the network to satisfy all users’ personalized requirements. However, given the limited resources, this is not always possible. In our work, we place a higher service priority on the NPN users, i.e., the resource allocation should be that the individual rate requirement and delay requirement of each NPN user is satisfied strictly. For users in other slices, personalized requirements are addressed to the best extent possible, ensuring that the overall user satisfaction remains above a certain threshold.

User satisfaction for users in the eMBB, URLLC, and NPN slices are respectively defined as follows:

ξ_{u}^{s} = min (1, \frac{r_{u}^{s}}{q_{s, u}^{v}}), \forall u \in U_{s}, \forall s \in S_{e}

(16)

ξ_{u}^{s} = min (1, \frac{q_{s, u}^{d}}{d_{s, u}^{total}}), \forall u \in U_{s}, \forall s \in S_{u}

(17)

\begin{matrix} ξ_{u}^{s} = \frac{1}{2} (min (1, \frac{q_{s, u}^{d}}{d_{s, u}^{total}}) + min (1, \frac{r_{u}^{s}}{q_{s, u}^{v}})), \forall u \in U_{s}, \forall s \in S_{n p} \end{matrix}

(18)

Hence, as can be seen, for an eMBB user u, the satisfaction is 1 as long as the achieved rate

r_{u}^{s}

is no smaller than the required rate

q_{s, u}^{v}

. Otherwise, User u’s satisfaction depends on the ratio of the achieved rate

r_{u}^{s}

and the required rate

q_{s, u}^{v}

. Similarly, for a URLLC user u, the satisfaction is 1 as long as the achieved delay

d_{s, u}^{total}

is no larger the required rate

q_{s, u}^{d}

. Otherwise, it is the ratio of the required delay

q_{s, u}^{d}

and the achieved delay

d_{s, u}^{total}

. Since the NPN users have both rate and delay requirements, we define the user satisfaction as the average of the rate satisfaction and the delay satisfaction. Finally, the user satisfaction of a slice s is defined as the user satisfaction average of all users in Slice s.

Recall that we place a higher service priority on the NPN users, i.e., their PRU need to be satisfied strictly. In other words, we require that

ξ_{u}^{s} = 1

for users of NPN slices, which further means that the user satisfaction of any NPN slice is equal to 1.

The network operator’s profit is

\begin{matrix} ψ = ψ_{R} - ψ_{E} \end{matrix}

(19)

where

ψ_{R}

and

ψ_{E}

represent the revenue and expenditure, respectively. More specifically, the revenue comes from the service data rate provided to the users and the expenditure mainly comes from the number of RBs used and the amount of computing resources used, i.e.,

ψ_{R} = ϱ \sum_{s \in S} r^{s}

(20)

ψ_{E} = σ_{1} \sum_{s \in S} \sum_{u \in U_{s}} \sum_{k \in K} x_{s, u}^{k} + σ_{2} \sum_{s \in S} \sum_{j = 1}^{t_{s}} c_{j}^{s}

(21)

where

r_{s}

as defined in (4) is the total achieved rate of Slice s;

c_{j}^{s}

as defined in (9) is the computing resource used by the j-th VNF of Slice s;

\sum_{s \in S} \sum_{u \in U_{s}} \sum_{k \in K} x_{s, u}^{k}

is the total number of RBs used, with

x_{s, u}^{k}

defined in (1);

ϱ

represents the unit price of the service rate; and

σ_{1}

and

σ_{2}

represent the unit price of an RB and the computing resources, respectively.

2.4. E2E Optimization Problem

The optimization objective of this paper is to maximize the network operator’s profit while satisfying the constraint of total resources, the minimum rate, the maximum delay, the user satisfaction requirement of each slice, and the security and privacy of the NPN slices. More specifically, the E2E optimization problem is formulated as follows:

\begin{matrix} max_{{x_{s, u}^{k}, η_{j, v}^{s}}} & ψ \end{matrix}

(22a)

\begin{matrix} s . t . & \sum_{s \in S} \sum_{u \in U_{s}} \sum_{k \in K} x_{s, u}^{k} \leq K, \end{matrix}

(22b)

\begin{matrix} \sum_{s \in S} \sum_{j = 1}^{t_{s}} η_{j, v}^{s} \leq m_{v}, \forall v \in V, \end{matrix}

(22c)

\begin{matrix} \sum_{s \in S} \sum_{j = 1}^{t_{s}} η_{j, v}^{s} c_{j}^{s} \leq o_{v}, \forall v \in V, \end{matrix}

(22d)

\begin{matrix} \sum_{s \in S} \sum_{j = 1}^{t_{s} - 1} η_{j, v}^{s} η_{j + 1, v^{'}}^{s} R_{j + 1}^{s} \leq l_{v, v^{'}}, \forall l_{v, v^{'}} \in L, \end{matrix}

(22e)

\begin{matrix} r_{u}^{s} \geq r_{min}^{s}, \forall u \in U_{s}, \forall s \in S, \end{matrix}

(22f)

\begin{matrix} d_{s, u}^{total} \leq d_{max}^{s}, \forall u \in U_{s}, \forall s \in S, \end{matrix}

(22g)

\begin{matrix} \frac{1}{| U_{s} |} \sum_{u \in U_{s}} ξ_{u}^{s} \geq ξ_{min}^{s}, \forall s \in S, \\ and (2), (8), (14), (15) . \end{matrix}

(22h)

where the objective function is as defined in (19). Note that the constraint in (22b) ensures that the number of RBs allocated to the users do not exceed the total number of RBs in the resource pool; constraint (22c) indicates that the number of VNFs deployed on each server node cannot exceed the number of VMs hosted by the server node; constraint (22d) specifies that the sum of the computing resources required by all VNFs deployed on a server node cannot exceed the available computing resources of the server node; constraint (22e) indicates that the sum of the data traffic on each link does not exceed the bandwidth limit of the link; constraint (22f) ensures that the data rate of each end-user in Slice s, i.e.,

r_{u}^{s}

as defined in (3), is not lower than the minimum rate threshold

r_{min}^{s}

; constraint (22g) ensures that the E2E delay of each end-user in Slice s, i.e.,

d_{s, u}^{total}

as defined in (13), cannot exceed the maximum delay threshold

d_{max}^{s}

; and constraint (22h) indicates that the user satisfaction of Slice s should not be lower than the satisfaction threshold

ξ_{min}^{s}

, and, according to previous discussions, we have

ξ_{min}^{s} \leq 1

when s is an eMBB or URLLC slice and

ξ_{min}^{s} = 1

when s is a NPN slice.

The proposed problem of (22) is a binary nonlinear integer programming problem. The time complexity for solving this problem is

O (2^{K U} 2^{V \sum_{s \in S} t_{s}})

; furthermore, it can be reduced to a knapsack problem, which is NP hard. Hence, it is inefficient and infeasible to exhaust all possible allocation options to find the optimal one. However, Artifical Intelligence (AI) has powerful decision making and information extraction capabilities, especially deep reinforcement learning. Furthermore, many researchers have shown that AI-based resource allocation methods are more efficient than other methods [19,23]. Therefore, in this paper, we resort to machine learning and deep reinforcement learning approaches to solve the proposed problem (22).

3. Algorithm Design

The problem in (22) consists of two types of optimization variables:

x_{s, u}^{k}

, which indicates how to assign RBs to users in the slices in RAN, and

η_{j, v}^{s}

, which indicates how to map VNFs of all slices to the VMs of the server nodes in CN. Looking at the objective function of the problem in (22), we see that it is only a function of the allocation scheme of RBs in RAN and is not related to the mapping scheme of the VNFs in CN. Furthermore, (22b), (22f), and (2) are constraints that only involve the RB allocation scheme, while (22c), (8), (14), and (15) are constraints that only involve the VNF mapping scheme. The constraints (22d), (22e), (22g), and (22h) involve both the RB allocation scheme and the VNF mapping scheme due to the delay variables

d_{s, u}^{total}

and the rate variables

r_{u}^{s}

.

For problem (22), once the RB allocation scheme is fixed, the variables

r_{u}^{s}

is fixed and

d_{RAN}^{s, u}

in (5) is fixed. According to (4), fixed

r_{u}^{s}

provides fixed

r^{s}

; hence, given the RB allocation scheme, (22d) and (22e) becomes constraints involving only the VNF variables

η_{j, v}^{s}

. Next, we transform the constraints (22g) and (22h) to separate the effect of the RB allocation scheme and the VNF mapping scheme. First, denote the total CN delay of a User u in Slice s as

y^{s}

, which is a function of only s, and not a function of u according to (12). Then, we discuss the transform of the constraints (22g) and (22h) for different types of lices.

(1): For User u in an eMBB slice s, due to (16), the constraint (22h) only involves the RB allocation scheme. Next, we discuss the transform of (22g). To satisfy (22g), $y^{s}$ must satisfy

$\begin{matrix} y^{s} \leq d_{\max}^{s} - max_{u \in U_{s}} d_{RAN}^{s, u} \end{matrix}$

(23)

according to (13). Hence, based on (12), for the eMBB slice s, to satisfy (22g), the VNF mapping scheme must satisfy

$\begin{matrix} d_{prop, CN}^{s} + d_{proc, CN}^{s} \leq d_{\max}^{s} - max_{u \in U_{s}} d_{RAN}^{s, u} \end{matrix}$

(24)
(2): For the URLLC or NPN slice s, to satisfy (22g), $y^{s}$ again has to satisfy (23). Now, we transform the constraint (22h). The term $min (1, \frac{q_{s, u}^{d}}{d_{s, u}^{total}}) = min (1, \frac{q_{s, u}^{d}}{d_{RAN}^{s, u} + y^{s}})$ . Hence, for User u in the URLLC slices (NPN slices), we may express the user satisfaction $ξ_{u}^{s}$ as a function of $y^{s}$ , according to (17) and (18) for a fixed RB allocation scheme. Correspondingly, according to constraint (22h), $y^{s}$ has to satisfy

$\begin{matrix} \frac{1}{U_{s}} \sum_{u \in U_{s}} min (1, \frac{q_{s, u}^{d}}{d_{RAN}^{s, u} + y^{s}}) \geq ξ_{min}^{s} \end{matrix}$

(25)

for the URLLC slice s, and

$\begin{matrix} \frac{1}{U_{s}} \sum_{u \in U_{s}} \frac{1}{2} (min (1, \frac{q_{s, u}^{d}}{d_{RAN}^{s, u} + y^{s}}) + min (1, \frac{r_{u}^{s}}{q_{s, u}^{v}})) \geq ξ_{min}^{s} \end{matrix}$

(26)

for the NPN slice s. The left-hand side (LHS) of (25) and (26) are both decreasing functions of $y^{s}$ . Hence, the bisection search method can be used to find the maximum $y^{s}$ that satisfies (25) and (26), which is denoted as $y_{max}^{s}$ . Hence, for the URLLC or NPN slice s, to satisfy (22g) and (22h), the VNF mapping scheme must satisfy

$\begin{matrix} d_{prop, CN}^{s} + d_{proc, CN}^{s} \leq min (d_{\max}^{s} - max_{u \in U_{s}} d_{RAN}^{s, u}, y_{max}^{s}) \end{matrix}$

(27)

Define

d_{C N}^{s}

as

\begin{matrix} d_{C N}^{s} = \{\begin{matrix} d_{\max}^{s} - max_{u \in U_{s}} d_{RAN}^{s, u} & for eMBB slice s \\ min (d_{\max}^{s} - max_{u \in U_{s}} d_{RAN}^{s, u}, y_{max}^{s}) & for URLLC, NPN slice s \end{matrix} \end{matrix}

(28)

then, (24) and (27) can be expressed using a uniform expression

\begin{matrix} d_{prop, CN}^{s} + d_{proc, CN}^{s} \leq d_{C N}^{s}, \forall s \in S . \end{matrix}

(29)

Therefore, we have successfully split the problem (22) into two sub-problems. The first one is the RB allocation problem, which is

\begin{matrix} max_{{x_{s, u}^{k}}} & ψ \\ s . t . & (22b), (22f), (2), \\ \frac{1}{| U_{s} |} \sum_{u \in U_{s}} min (1, \frac{r_{u}^{s}}{q_{s, u}^{v}}) \geq ξ_{min}^{s}, for any eMBB slice s \\ {x_{s, u}^{k}} implementable by VNF mapping \end{matrix}

(30)

and the second one is the feasibility problem of the VNF mapping, i.e., the checking of the last constraint of the problem (30) for a given RB allocation scheme:

\begin{matrix} find & {η_{j, v}^{s}} \\ s . t . & (22c), (22d), (22e), (8), (14), (15), (29) . \end{matrix}

(31)

Note that among the constraints of the problem, (31), (22c), (8), (14), and (15) only depend on the variables

{η_{j, v}^{s}}

and are independent of the RB allocation scheme. For a given RB allocation scheme, the variables

{r^{s}, s \in S}

are given and thus, constraints (22d) and (22e) are well defined. Furthermore, the variables

{(r_{u}^{s}, d_{RAN}^{s, u}), u \in U_{s}, s \in S}

are given; thus, the constraint (29) is well defined. Hence, for a given RB allocation scheme, the problem (31) is well defined.

Based on this observation, to solve the problem of (22), in the outer loop, the multi-agent reinforcement learning IPPO algorithm is used on problem (30) to decide how to allocate the RBs so as to maximize the objective function. For each RB allocation scheme considered by the outer loop, in the inner loop, the RF algorithm is used on problem (31) to check the feasibility of the RB allocation scheme which determines whether the CN can successfully complete the mapping of the VNFs for all slices.

In the following two subsections, we will provide details on the VNF mapping problem, i.e., (31), solved using the RF algorithm, and the RB allocation problem, i.e., (30), solved using the multi-agent reinforcement learning IPPO algorithm, respectively.

3.1. The VNF Mapping Problem (31)

We first introduce the Deep First Search with the pruning (DFS + pruning) algorithm to determine whether the VNFs of all slices can be successfully deployed in CN such that the constraints of problem (31) are satisfied. After that, in order to reduce the computational time overhead, we use supervised learning techniques to train an RF to determine whether all the VNFs can be successfully deployed in the CN.

3.1.1. DFS + Pruning Algorithm

Depth-First Search (DFS) is an algorithm for traversing a tree or graph, which mainly traverses the nodes of the tree or graph in the depth direction. When all children of node v have been traversed or do not satisfy the constraint, it goes back to the parent node of node v and continues traversing in the depth direction until all nodes have been visited.

For problem (31), the depth of the tree is the total number of VNFs, which is equal to the number of slices multiplied by the number of VNFs in each slice, i.e.,

Depth = | S | \times t_{s}

. Recall that the number of servers is V. The number of nodes in the first layer is V and it represents the possibilities of the deployment of the first VNF, i.e., which server the first VNF is deployed on. For each node in the first layer, there are V number of children node, and the second layer represents the possibilities of the deployment of the second VNF. For example, the third children node of the fifth first-layer node represents the deployment strategy, where the first VNF in deployed on Server 5 and the second VNF is deployed on Server 3. The other layers are structured accordingly, i.e., the number of children of each node is V. An illustration of the tree is provided in Figure 3.

The time complexity of the whole algorithm can be expressed as

O (1 + V + V^{2} + \dots + V^{Depth}) = O (V^{Depth})

. In order to reduce the time complexity of the algorithm, pruning is considered to reduce the search space.

The specific pruning strategies are as follows:

Consider VNFs in the order of NPN slices, URLLC slices, and eMBB slices.
When the current node does not satisfy the constraints of problem (31), the search along the depth direction of the current node is stopped.

The pseudo-code of the DFS-pruning algorithm is presented in Algorithm 1. During the traversal of the tree, the traversal path is recorded. It is the information about which server node the VNFs should be deployed on. Therefore, the above DFS+pruning algorithm can be used not only to determine whether the VNFs of all slices can be successfully deployed in CN, but also to find all VNF deployment schemes that meet the constraints.

Algorithm 1 DFS + Pruning.

Input:

VNF Set of all slices

I = {1, 2, \dots, i, \dots, I}

;

The initial state

s_{0}

Initial call to the function DFS(1,

s_{0}

)

Output:

Whether VNF mapping can be completed. True or False

Process:

Function DFS(Current Considered VNF i, current state s)

1:: Recursive termination conditions
2:: if $i = I + 1$ then
3:: Indicates that all VNFs are deployed
4:: Find a deployment scheme
5:: return True
6:: end if
7:: Consider deploying current VNF i to node v in turn
8:: for $v = 1, 2, \dots, V$ do
9:: if node v does not satisfy the constraints of problem (31) then
10:: continue
11:: else
12:: Record current state s
13:: Deploy the current VNF i to node v
14:: Update the status to $s^{'}$
15:: Recursive calls DFS( $i + 1$ , $s^{'}$ )
16:: Restore the scene: reset the current state to s
17:: end if
18:: end for
19:: return False

The disadvantage of the DFS+pruning algorithm is that it has a relatively high time complexity. Next, ML is considered to train a classifier to solve the problem (31).

3.1.2. Random Forest Classifier

First, multiple sets of

D : = {r^{s}, d_{C N}^{s}, s \in S}

are randomly generated. These would specify the constraints of the problem (31). Then, the above DFS+pruning algorithm is used to generate the labels corresponding to the generated sets

D

. The labels represent whether the CN can successfully complete the VNF mapping. After that, the training set generated in the above way can be used to train a classifier to solve this binary classification problem. The classifier used in this paper is RF, which is a very powerful supervised learning algorithm.

3.2. The RB Allocation Problem (30)

We first model the problem (30) as a Markov Decision Process (MDP). Then, we employ the multi-agent reinforcement learning IPPO algorithm to solve the MDP.

MDP Formulation for RB Allocation

The MDP of the problem is given by the triplet

(S, A, R)

, where

S

represents the state space,

A

denotes the action space, and

R

is the reward function. In this paper, we decide to use multi-agent reinforcement learning in order to reduce the size of the action space. The details of the MDP are defined as follows:

Agent: slice $s, \forall s \in S$
State space $S$ : For agent s, the state space $S^{s}$ is given using the following tuple:

$S^{s} = (U_{s}, G_{s}, Q_{s}, N_{s})$

(32)

where $U_{s}$ represents the set of users in slice s, $G_{s}$ denotes the set of channel gain of users in slice s, $Q_{s}$ represents the PRU in slice s, and $N_{s}$ denotes the set of RBs assigned to each user in slice s. Thus, the whole state space can be expressed as:

$S = {S^{1}, S^{2}, \dots, S^{S}}$

(33)
Action space $A$ : For agent s, the action space $A^{s}$ is defined as follows:

$A^{s} = {0, 1}^{K \times | U_{s} |}$

(34)

where K denotes the total number of RBs and $U_{s}$ denotes the set of users in the slice s. The above action space $A^{s}$ represents the set of specific schemes for the resource allocation, which can be represented as a row vector:

$A^{s} = {x_{s, 1}^{1}, \dots, x_{s, | U_{s} |}^{1}, x_{s, 1}^{2}, \dots, x_{s, | U_{s} |}^{2}, \dots, x_{s, 1}^{K}, \dots, x_{s, | U_{s} |}^{K}}$

where $x_{s, u}^{k}$ is defined in (1). Each slice is responsible for the RB assignment for all the users within the slice. Thus, the whole action space can be expressed as the Cartesian product of all $A^{s}$ , i.e.,

$A = A^{1} \times A^{2} \times \dots \times A^{S}$
Reward function $R$ :

Each action corresponds to an RB allocation scheme. When the RB allocation scheme satisfies all the constraints of the problem (30), the reward function is related to the corresponding profit, which is the optimization objective of (30). Otherwise, a negative reward is given. More specifically,

R = \{\begin{matrix} e^{(ψ - \bar{ψ}) / 10}, & If all constraints are met \\ - 1, & Otherwise \end{matrix}

(35)

where

ψ

represents the achieved profit of the RB allocation scheme, which is given in (19), and

\bar{ψ}

is the approximation of the average between the minimum and maximum profit over all RB allocation schemes. A relatively accurate approximation will aid in the convergence of the reinforcement learning.

It should be noted that, to check whether all constraints are met in (35), the RF algorithm introduced in Section 3.1 is used to determine whether the last constraint of problem (30) is satisfied. The remaining constraints of the problem (30) can be easily judged using the given RB allocation scheme.

Agent s observes its own current state

S^{s}

and chooses an action

a^{s}

from the action space

A^{s}

. The actions of all slices form a joint action

a = {a^{1}, a^{2}, \dots, a^{S}}

. The joint action updates the current state to the next state and receives the reward

R

. The reward will be given to all agents as feedback for the agents to keep adjusting their actions. The whole MDP process is shown in Figure 4.

3.3. Independent Proximal Policy Optimization (IPPO)

The MDP modeled above for this problem can be solved using various reinforcement learning methods. We tried several popular reinforcement learning methods and found that the IPPO algorithm is the most effective. Therefore, we propose to use the IPPO algorithm to solve the RB resource allocation problem of (30).

The Proximal Policy Optimization (PPO) algorithm is an On-policy algorithm that can be used for both continuous action space tasks and discrete action space tasks. PPO is based on the idea of the Trust Region Policy Optimization (TRPO) algorithm, but its algorithm implementation is simpler and more effective.

The IPPO algorithm is the multi-agent version of the PPO. Each agent is trained independently with the PPO algorithm. PPO is based on the Actor–Critic architecture, so two neural networks, the policy network and the value network, are required in the training process. The policy network is mainly responsible for choosing actions based on the current state and policy, interacting with the environment to receive feedback, and continuously updating the policy. The value network is mainly responsible for learning the state value function, which is used to help the policy network to update the strategy. In this paper, each slice corresponds to an agent, so the number of agents is S. Therefore, the number of neural networks required for the IPPO algorithm is

2 S

.

4. Simulation Results

In this paper, the simulation experiments are performed on a PC with a 12th Gen Intel(R) Core(TM) i9-12900K processor and 8 GB of RAM. To simulate the proposed problem, we use the programming language Python 3.10.4 and PyCharm Community Edition 2022.2.1. In the simulation, we consider one eMBB slice, one URLLC slice, and one NPN slice, i.e.,

| S_{e} | = | S_{u} | = | S_{n p} | = 1

. The users of the eMBB slice and the URLLC slice are uniformly distributed in a

4 \times 4

km

^{2}

area. There are three gNodeBs in this area. The users of the NPN slice are generated randomly around a particular gNodeB only. We assume that the number of users in the eMBB, URLLC, and NPN slices are 10, 10, and 5, respectively. The PRU are generated via a uniform distribution within a given interval. Specifically, the personalized rate requirement interval for eMBB users is

[2, 4]

(Mbps), the personalized delay requirement interval for the URLLC users is

[2.2, 2.8]

(ms), and the personalized rate and delay requirement interval for the NPN users is

[2.5, 5]

(Mbps) and

[4.5, 6]

(ms), respectively. We set the user satisfaction threshold of the eMBB, URLLC, and NPN slice to

0.9

,

0.9

, and 1, respectively, i.e.,

(ξ_{m i n}^{e}, ξ_{m i n}^{u}, ξ_{m i n}^{n p}) = (0.9, 0.9, 1)

.

We assume that the SFC of the eMBB slice is VNF1 → VNF3 → VNF4 → VNF5, the SFC of the URLLC slice is VNF2 → VNF4 → VNF5 → VNF6, and the SFC of the NPN slice is VNF1 → VNF7 → VNF3 → VNF8. The core link transmission delay is uniformly distributed from 0.5 ms to 2 ms and the core link bandwidth is uniformly distributed from 10 Mbps to 50 Mbps. The remaining simulation parameters are summarized in Table 2.

We create and train the neural networks of the IPPO using the Pytorch 1.10.2 framework. More specifically, for each agent, the Actor Network and the Critic Network both consists of two fully connected hidden layers, each with 64 neurons, and the Adam optimizer is used. With the exception that the activation function of the output layer is softmax, the activation function of the remaining layers is ReLU. The other hyperparameters of IPPO are given in Table 3.

The RF algorithm is used to determine whether the RB allocation scheme of RAN can be successfully implemented by the VNF mapping in CN. The dataset of the RF algorithm is the randomly generated sets of

D : = {r^{s}, d_{C N}^{s}, s \in S}

and the label is 0 or 1, which is found using the DFS+pruning algorithm. We picked 50,000 sets of data with the label 0 and 50,000 sets of data with the label 1 to train the RF. The most important hyperparameter of the RF is the number of decision trees

n_e s t i m a t o r

. To find the optimal hyperparameter, the dataset is first divided into a training set, validation set, and test set in the ratio of 3:1:1. Then, the RF is constructed with a different number of decision trees

n_e s t i m a t o r

, and the four-fold cross-validation is performed with the training set and the validation set. The mean accuracy of the cross-validation is plotted in Figure 5 for different

n_e s t i m a t o r

values. The red circle marks the optimal point, which corresponds to

n_e s t i m a t o r = 21

, which is the value we used to build the RF.

We train an RF model with the total training set, which includes the training set and the validation set. The accuracies achieved on the total training set and the test set is 0.99994 and 0.99910, respectively.

A better way to test the training effectiveness of the model is to use the confusion matrix. As shown in Figure 6, where the rows correspond to the predicted labels and the columns correspond to the true labels, each element represents the number of data belonging to the corresponding case. It can be seen that for both the training set and the test set, the overall predictions are reasonably accurate, despite a few instances of incorrect predictions. It can be concluded that the well-trained RF model has a high accuracy in predicting the feasibility of VNF mapping solutions in CN.

The IPPO model used to find the optimal RB allocation scheme is generated and trained according to the simulation parameters in Table 2. The cumulative reward for each episode is shown in Figure 7, where Figure 7a shows the actual episode reward curve during the training process and Figure 7b shows the reward curve after moving average processing. It can be seen that as the number of training episodes increases, the reward keeps increasing and the performance of the agent improves significantly at around 1000 episodes, eventually converging gradually. The result shows that the training of IPPO is effective and the multiple agents learn how to choose the action so that the reward (i.e., the objective function of the model) is as large as possible.

In order to illustrate the benefits of using the RF algorithm, we compare two cases. In both cases, the outer loop of finding the optimal RB allocation scheme uses the proposed IPPO method. In one case, the inner loop uses RF to determine the feasibility of the VNF mapping, denoted as IPPO-RF, and in the other case, DFS is used in the inner loop, denoted as IPPO-DFS.

The results are shown in Figure 8, where the green bar indicates the IPPO-RF algorithm and the blue bar indicates the IPPO-DFS algorithm. It can be seen that the two algorithms obtain the approximately same profit, but the run time using RF is reduced by 198% compared to using DFS. Therefore, the RF algorithm is more efficient than the DFS algorithm in the sub-problem of verifying the feasibility of VNF mapping. This is precisely the significance of introducing machine learning into our algorithm.

Next, we illustrate the case where the PRU is relatively large and/or the amount of resources is relatively limited, i.e., not all users can have their PRU met. We run the proposed IPPO-RF algorithm and plot the user satisfaction of the eMBB, URLLC, and NPN slices for five different instances of personalized rate requirements in Figure 9. As can be seen, the user satisfaction of the NPN slice is always 1, which means that the higher service priority of the NPN users are guaranteed via the proposed IPPO-RF algorithm. Due to the limited resources, the user satisfaction of the eMBB slice and the URLLC slice is strictly below 1 but higher than the user satisfaction threshold, i.e.,

ξ_{m i n}^{e} = ξ_{m i n}^{u} = 0.9

. Whether the eMBB slice or the URLLC slice should obtain a larger user satisfaction depends on the RB allocation scheme that can maximize the profit, and, at the same time, be implementable by a VNF mapping, and such an RB allocation scheme is found using the IPPO-RF algorithm. It can be concluded that the resource allocation scheme found via the proposed IPPO-RF algorithm does reflect that the NPN slice has higher service priority and reliability compared to the eMBB and uRLLC slices.

Next, we demonstrate that the RB allocation scheme found using the proposed IPPO-RF scheme can indeed guarantee the security and privacy of the NPN slices. First, we introduce three performance metrics related to the privacy of a slice: (1) the privacy of a slice on the servers; (2) the privacy of a slice on the links; and (3) the average privacy of a slice. We primarily characterize the privacy of each slice by the extent of the sharing of the servers and links occupied by the VNFs of each slice. The privacy of a slice on the servers depicts how heavily the slice shares the server nodes with other slices. More specifically, we count how many slices occupy the same server node occupied by the j-th VNF of Slice s and denote this number as

n_{j}^{s}

. The privacy of Slice s on the servers is defined as

\sum_{j = 1}^{t_{s}} \frac{c_{j}^{s}}{c^{s} n_{j}^{s}}

, where

c_{j}^{s}

denotes the computing resources required for the j-th VNF of slice s, and

c^{s} = \sum_{j = 1}^{t_{s}} c_{j}^{s}

. Similarly, we define the privacy of Slice s on the links. Finally, the average privacy of a slice is the average of the privacy of a slice on the servers and the links.

In Figure 10, we demonstrate the privacy of the eMBB, URLLC, and NPN slices obtained via the proposed IPPO-RF algorithm for one instance of personalized user requirements. It can be seen that the privacy of the NPN slice on the servers and links are both 1, which means that the VNFs in the NPN slice do not share server nodes or links with other slices, thus improving its security and privacy. On the other hand, the eMBB slice and the URLLC slice share some servers and links to maximize the profit. It can be concluded that the resource allocation scheme found using the proposed IPPO-RF algorithm indeed indicates that the NPN slice has higher security and privacy compared to the eMBB and uRLLC slices.

Finally, we illustrate the performance gain of the proposed IPPO-RF algorithm with three benchmark methods.

(1) Allocation in Proportion to the Personalized Requirement of Users (PPRU): First, the personalized requirement of each user

q_{s, u}^{d}

or

q_{s, u}^{v}

is multiplied with the minimum satisfaction threshold

ξ_{m i n}^{s}

of the slice s and noted as

q_{s, u}^{'}

. After that, the minimum RB resources required to satisfy

q_{s, u}^{'}

are calculated and used as the allocation scheme for user u in slice s.

(2) Allocation by Maximizing User Rates (MUR): First, resource pre-allocation is performed to ensure that the allocated resources can meet the QoS demand of each user. After that, the remaining RBs are continuously allocated to the user with the largest rate increment and the rate increment of this user is updated. The allocation is stopped until the user satisfaction of the slice meets the requirements. Take the personalized rate requirement

q_{s, u}^{v}

as an example, where the rate increment for user u is calculated as

min (r_{s, u}^{R B}, q_{s, u}^{v} - r_{s, u}^{n o w})

, where

r_{s, u}^{R B}

denotes the rate that an RB can provide to user u in slice s, and

r_{s, u}^{n o w}

indicates the rate that the current user u has acquired. For personalized delay requirements

q_{s, u}^{d}

, first convert to the rate requirements and then process them as outlined above.

(3) Allocation by Maximizing User Satisfaction (MUS): First, the same resource pre-allocation process is performed as for the algorithm MUR. After that, the remaining RBs are continuously allocated to the user with the largest satisfaction increment and the satisfaction increment of this user is updated. The allocation is stopped until the user satisfaction of the slice meets the requirements. The satisfaction increment is calculated in a manner similar to the rate increment in the MUR algorithm.

To avoid the chance of single-moment results, we change the PRU at multiple moments. First, we compare the probability of successfully completing the resource allocation for the four algorithms, randomly changing the PRU, and counting the success rate of the four algorithms in 1000 experiments. The results are shown in Table 4. It is evident that the proposed IPPO-RF algorithm consistently provides a resource allocation scheme that meets all the requirements in every experiment. In contrast, the success rates of the remaining three algorithms are 0.42, 0.54, and 0.61, respectively, i.e., it is not guaranteed that the given resource allocation scheme satisfies all the constraints.

Then, compare the profits of the above four algorithms for five different instances when all algorithms successfully perform the resource allocation. The results are shown in Figure 11. It can be observed that, compared to the other three algorithms, the proposed IPPO-RF algorithm can provide resource allocation schemes with higher profit. The three benchmark methods are ranked in descending order of performance as MUR, MUS, and PPRU. The proposed IPPO-RF algorithm outperforms the three benchmark algorithms PPRU, MUR, and MUS by 84%, 46%, and 57%, respectively. It can be concluded that the IPPO-RF algorithm proposed in this paper has better performance and can provide a better allocation scheme.

5. Conclusions

In this paper, we propose a network slicing resource allocation problem for the scenario where both public and non-public slices coexist. The proposed problem is first formulated as a nonlinear integer programming problem. Then, the IPPO-RF algorithm is proposed to find the resource allocation scheme that makes the operator profits as large as possible. Based on the simulation results, it can be concluded that, firstly, the well-trained RF model has a high accuracy in predicting the feasibility of VNF mapping solutions. Hence, the incorporation of machine learning, particularly the RF algorithm, accelerates the decision-making process of the IPPO algorithm. In addition, the proposed IPPO-RF algorithm outperforms the three benchmark algorithms PPRU, MUR, and MUS by 84%, 46%, and 57%, respectively. It demonstrates that the IPPO-RF algorithm has good performance. Finally, the resource allocation scheme found via the IPPO-RF algorithm not only satisfies all constraints, but also ensures the service priority and privacy of the NPN slices. Therefore, the algorithm based on the combination of deep reinforcement learning and machine learning proposed in this paper provides a worthy idea for solving the problem.

Author Contributions

Conceptualization, Z.P. and X.Y.; methodology, Y.W., N.L. and Z.P.; formal analysis, Y.W. and N.L.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W., N.L., Z.P. and X.Y.; supervision, N.L., Z.P. and X.Y.; Funding acquisition, N.L., Z.P. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the Fundamental Research Funds for the Central Universities, 2242022K60001, the National Natural Science Foundation of China under Grants 62071115, and the Research Fund of National Mobile Communications Research Laboratory, Southeast University (No. 2023A03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available because the data is also a part of an ongoing study.

Conflicts of Interest

The authors declare no conflict of interest.The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

Debbabi, F.; Jmal, R.; Fourati, L.C.; Ksentini, A. Algorithmics and Modeling Aspects of Network Slicing in 5G and Beyonds Network: Survey. IEEE Access 2020, 8, 162748–162762. [Google Scholar] [CrossRef]
Zhou, J.; Zhao, W.; Chen, S. Dynamic Network Slice Scaling Assisted by Prediction in 5G Network. IEEE Access 2020, 8, 133700–133712. [Google Scholar] [CrossRef]
Dogra, A.; Jha, R.K.; Jain, S. A Survey on Beyond 5G Network with the Advent of 6G: Architecture and Emerging Technologies. IEEE Access 2021, 9, 67512–67547. [Google Scholar] [CrossRef]
Shu, Z.; Taleb, T. A Novel QoS Framework for Network Slicing in 5G and Beyond Networks Based on SDN and NFV. IEEE Netw. 2020, 34, 256–263. [Google Scholar] [CrossRef]
Afolabi, I.; Taleb, T.; Samdanis, K.; Ksentini, A.; Flinck, H. Network Slicing and Softwarization: A Survey on Principles, Enabling Technologies, and Solutions. IEEE Commun. Surv. Tutorials 2018, 20, 2429–2453. [Google Scholar] [CrossRef]
Ordonez-Lucena, J.; Chavarria, J.F.; Contreras, L.M.; Pastor, A. The use of 5G Non-Public Networks to support Industry 4.0 scenarios. In Proceedings of the 2019 IEEE Conference on Standards for Communications and Networking (CSCN), Granada, Spain, 28–30 October 2019; pp. 1–7. [Google Scholar] [CrossRef]
Li, X.; Guimarães, C.; Landi, G.; Brenes, J.; Mangues-Bafalluy, J.; Baranda, J.; Corujo, D.; Cunha, V.; Fonseca, J.; Alegria, J.; et al. Multi-Domain Solutions for the Deployment of Private 5G Networks. IEEE Access 2021, 9, 106865–106884. [Google Scholar] [CrossRef]
Poe, W.Y.; Ordonez-Lucena, J.; Mahmood, K. Provisioning Private 5G Networks by Means of Network Slicing: Architectures and Challenges. In Proceedings of the 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
Yan, M.; Feng, G.; Zhou, J.; Qin, S. Smart Multi-RAT Access Based on Multiagent Reinforcement Learning. IEEE Trans. Veh. Technol. 2018, 67, 4539–4551. [Google Scholar] [CrossRef]
Filali, A.; Mlika, Z.; Cherkaoui, S.; Kobbane, A. Dynamic SDN-Based Radio Access Network Slicing with Deep Reinforcement Learning for URLLC and eMBB Services. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2174–2187. [Google Scholar] [CrossRef]
Yan, M.; Feng, G.; Zhou, J.; Sun, Y.; Liang, Y.C. Intelligent Resource Scheduling for 5G Radio Access Network Slicing. IEEE Trans. Veh. Technol. 2019, 68, 7691–7703. [Google Scholar] [CrossRef]
Wang, X.; Zhang, T. Reinforcement Learning Based Resource Allocation for Network Slicing in 5G C-RAN. In Proceedings of the 2019 Computing, Communications and IoT Applications (ComComAp), Shenzhen, China, 26–28 October 2019; pp. 106–111. [Google Scholar] [CrossRef]
Li, R.; Zhao, Z.; Sun, Q.; Chih-Lin, I.; Yang, C.; Chen, X.; Zhao, M.; Zhang, H. Deep Reinforcement Learning for Resource Management in Network Slicing. IEEE Access 2018, 6, 74429–74441. [Google Scholar] [CrossRef]
Mechtri, M.; Ghribi, C.; Zeghlache, D. A Scalable Algorithm for the Placement of Service Function Chains. IEEE Trans. Netw. Serv. Manag. 2016, 13, 533–546. [Google Scholar] [CrossRef]
Promwongsa, N.; Abu-Lebdeh, M.; Kianpisheh, S.; Belqasmi, F.; Glitho, R.H.; Elbiaze, H.; Crespi, N.; Alfandi, O. Ensuring Reliability and Low Cost When Using a Parallel VNF Processing Approach to Embed Delay-Constrained Slices. IEEE Trans. Netw. Serv. Manag. 2020, 17, 2226–2241. [Google Scholar] [CrossRef]
Sun, J.; Zhu, G.; Sun, G.; Liao, D.; Li, Y.; Sangaiah, A.K.; Ramachandran, M.; Chang, V. A Reliability-Aware Approach for Resource Efficient Virtual Network Function Deployment. IEEE Access 2018, 6, 18238–18250. [Google Scholar] [CrossRef]
Mijumbi, R.; Hasija, S.; Davy, S.; Davy, A.; Jennings, B.; Boutaba, R. Topology-Aware Prediction of Virtual Network Function Resource Requirements. IEEE Trans. Netw. Serv. Manag. 2017, 14, 106–120. [Google Scholar] [CrossRef]
Korrai, P.; Lagunas, E.; Sharma, S.K.; Chatzinotas, S.; Bandi, A.; Ottersten, B. A RAN Resource Slicing Mechanism for Multiplexing of eMBB and URLLC Services in OFDMA Based 5G Wireless Networks. IEEE Access 2020, 8, 45674–45688. [Google Scholar] [CrossRef]
Gharehgoli, A.; Nouruzi, A.; Mokari, N.; Azmi, P.; Javan, M.R.; Jorswieck, E.A. AI-based Resource Allocation in End-to-End Network Slicing under Demand and CSI Uncertainties. IEEE Trans. Netw. Serv. Manag. 2023, 20, 3630–3651. [Google Scholar] [CrossRef]
Li, J.; Shi, W.; Yang, P.; Ye, Q.; Shen, X.S.; Li, X.; Rao, J. A Hierarchical Soft RAN Slicing Framework for Differentiated Service Provisioning. IEEE Wirel. Commun. 2020, 27, 90–97. [Google Scholar] [CrossRef]
Li, T.; Zhu, X.; Liu, X. An End-to-End Network Slicing Algorithm Based on Deep Q-Learning for 5G Network. IEEE Access 2020, 8, 122229–122240. [Google Scholar] [CrossRef]
Sun, G.; Gebrekidan, Z.T.; Boateng, G.O.; Ayepah-Mensah, D.; Jiang, W. Dynamic Reservation and Deep Reinforcement Learning Based Autonomous Resource Slicing for Virtualized Radio Access Networks. IEEE Access 2019, 7, 45758–45772. [Google Scholar] [CrossRef]
Liang, L.; Ye, H.; Yu, G.; Li, G.Y. Deep-learning-based wireless resource allocation with application to vehicular networks. Proc. IEEE 2019, 108, 341–356. [Google Scholar] [CrossRef]

Figure 1. Illustration of the E2E network slicing structure.

Figure 2. An example of the date processing rate in the SFC.

Figure 3. DFS + pruning algorithm.

Figure 4. Multi-agent interaction with the MDP environment.

Figure 5. Cross-validation accuracy graph for RF.

Figure 6. Confusion matrix.

Figure 7. Training rewards for IPPO. (a) Actual reward curve during training. (b) Smoothed reward curve with moving average processing.

Figure 8. Comparison of the IPPO-RF and IPPO-DFS.

Figure 9. User satisfaction of each slice.

Figure 10. Security of each slice.

Figure 11. Performance comparison of different algorithms.

Table 1. Summary of the related works.

Ref.	Objective Function	Slicing Domain	Optimization Algorithm	Non-Public Network
[9]	Throughput maximization	RAN	Nash Q-learning & MCTS	✕
[10]	Sum-rate maximization	RAN	Deep Q-Learning	✕
[11]	Utilization maximization	RAN	LSTM & A3C	✕
[12]	Utility maximization	C-RAN	Multi-agent Q-learning	✕
[13]	Spectral efficiency maximization	E2E	Deep Q-Learning	✕
[14]	Revenue maximization	CN	Feature decomposition & Heuristic	✕
[15]	Cost minimization	CN	Tabu search	✕
[16]	Expenditures minimization	CN	Heuristic algorithm	✕
[17]	Failure rate minimization	CN	Graph neural network	✕
Our work	Profit maximization	E2E	Deep reinforcement learning & Machine learning	✓

Table 2. Simulation Parameters.

Parameters	Value
Total number of users, U	25
Number of gNodeBs, B	3
Number of slices, S	3
Total number of RBs, K	80
Bandwidth of an RB, W	180 KHz
Downlink transmission power, $P_{d}$	30 dBm
Power of AWGN, $σ^{2}$	−114 dBm
Pathloss from gNodeB to user	$128.1 + 37.6 l o g_{10} (D [km])$ dB
Channel	Rayleigh fading [19]
Packet arriving rate per user in the corresponding slice, $λ_{e}, λ_{u}, λ_{n p}$	100, 10, 50 packets/s
Packet length per user in the corresponding slice, $ω_{e}, ω_{u}, ω_{n p}$	5, 2, 9 kbits
Number of server nodes, V	8
Number of VNF types, F	8
Server computing resources, $O$	${10, 9, 5, 2, 5, 3, 12, 5} \times 10^{2}$
Number of server VMs, $M$	${3, 2, 2, 1, 1, 1, 2, 2}$
Data rate expansion ratio, $β$	${0.9, 0.7, 1.2, 1.5, 0.8, 1, 0.8, 1.1}$
Correlation coefficient, $α$	$10^{- 5}$
Unit price of service rate, $ϱ$	20
Unit price of RB and computing resources, $σ_{1}, σ_{2}$	13, 0.1
Minimum rate of slices, $r_{m i n}^{e}, r_{m i n}^{u}, r_{m i n}^{n p}$	2, 1, 2.5 Mbps
Maximum delay of slices, $d_{m a x}^{e}, d_{m a x}^{u}, d_{m a x}^{n p}$	10, 3, 6 ms

Table 3. Hyper parameters for IPPO.

Hyper-Parameters	Value
Learning rate of actor	0.0002
Learning rate of critic	0.0001
Number of episodes	5000
Discount factor	0.99
The scope of the clip, $ε$	0.2
Loss function	Mean square error

Table 4. Comparison of success rates of different algorithms.

Algorithm	Success Rate
IPPO-RF	1.00
PPRU	0.42
MUR	0.54
MUS	0.61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Liu, N.; Pan, Z.; You, X. AI-Based Resource Allocation in E2E Network Slicing with Both Public and Non-Public Slices. Appl. Sci. 2023, 13, 12505. https://doi.org/10.3390/app132212505

AMA Style

Wang Y, Liu N, Pan Z, You X. AI-Based Resource Allocation in E2E Network Slicing with Both Public and Non-Public Slices. Applied Sciences. 2023; 13(22):12505. https://doi.org/10.3390/app132212505

Chicago/Turabian Style

Wang, Yuxing, Nan Liu, Zhiwen Pan, and Xiaohu You. 2023. "AI-Based Resource Allocation in E2E Network Slicing with Both Public and Non-Public Slices" Applied Sciences 13, no. 22: 12505. https://doi.org/10.3390/app132212505

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Based Resource Allocation in E2E Network Slicing with Both Public and Non-Public Slices

Abstract

1. Introduction

2. System Model

2.1. Radio Access Network Domain

2.2. Core Network Domain

2.3. Performance Metrics

2.4. E2E Optimization Problem

3. Algorithm Design

3.1. The VNF Mapping Problem (31)

3.1.1. DFS + Pruning Algorithm

3.1.2. Random Forest Classifier

3.2. The RB Allocation Problem (30)

MDP Formulation for RB Allocation

3.3. Independent Proximal Policy Optimization (IPPO)

4. Simulation Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI