Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II

Hou, Jie; Du, Juan; Chen, Zhi

doi:10.3390/app13116757

Open AccessArticle

Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II

by

Jie Hou

,

Juan Du

^* and

Zhi Chen

School of Mechanical Engineering, Taiyuan University of Science and Technology, Taiyaun 030024, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6757; https://doi.org/10.3390/app13116757

Submission received: 10 April 2023 / Revised: 23 May 2023 / Accepted: 30 May 2023 / Published: 1 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

To address the issues of low efficiency and lengthy running time associated with trajectory planning for 6-degree-of-freedom manipulators, this paper introduces a novel solution that generates a time-optimal path for a manipulator while adhering to its kinematic limitations. The proposed method comprises several stages. Firstly, the kinematics of the manipulator are analyzed. Secondly, the manipulator’s joint-space path points are interpolated via the quintic B-spline curve. Subsequently, the non-dominated sorting genetic algorithm II (NSGA-II) is improved by applying reinforcement learning to optimize its crossover and mutation probabilities, and the variable neighborhood search (VNS) algorithm is integrated to enhance its local search capability. Finally, the joint increments and running time of the manipulator are optimized using the improved NSGA-II, and the time-optimal trajectory is then determined by simulation on MATLAB. Furthermore, compared with other conventional optimization methods, the proposed approach has reduced the total running time by 19.26%, effectively improving the working efficiency of the manipulator.

Keywords:

time-optimal trajectory planning; manipulator; NSGA-II algorithm; reinforcement learning; VNS algorithm

1. Introduction

Recent years have seen the rise of manipulators as critical components in numerous industries, as advancements in industrial technology have led to their growing importance. Consequently, the demand for high-performance manipulators has also increased significantly. Research on manipulators primarily includes aspects such as structural design [1], trajectory planning, and trajectory tracking [2,3]. Among these, trajectory planning is crucial in the design process of manipulators.

Many experts and scholars have conducted extensive research on various optimization problems related to manipulators, such as time-optimal [4], energy-optimal [5], impact-optimal [6], and multi-objective optimization [7], and have proposed a variety of solutions. In recent years, research and discussion have increasingly focused on developing efficient trajectory planning methods for manipulators, with a particular emphasis on minimizing the time required to complete a task.

When planning a time-optimal trajectory for manipulators, the goal is to generate a path that satisfies the manipulator’s kinematic constraints while also minimizing the time required to complete the task. One approach that was proposed in paper [8] is to use multiple quadratic radial basis functions to generate smooth motion trajectories for the manipulator. This method uses two objective functions, one based on the total running time and the other on the square of the jerk, which are minimized in order to achieve a manipulator’s optimal trajectory in time. Experimental results show that this method can generate highly smooth trajectories within a relatively short running time.

Paper [9] proposed a smooth and efficient trajectory planning method by combining an adaptive elitist genetic algorithm with singularity avoidance mechanism and quintic polynomials. The algorithm includes an elite population and an adaptive adjustment mechanism, which improves the recognition ability of the optimal solution. Furthermore, the singularity avoidance mechanism is introduced to prevent singular points in the generated trajectory. The combination of the adaptive elitist genetic algorithm with quintic polynomials generates smooth and efficient motion trajectories.

In paper [10], an enhanced teaching-based optimization algorithm is proposed, which incorporates variable neighborhood search to keep the algorithm out of local optima. The algorithm is further combined with the quintic B-spline function to generate time-optimal motion trajectories.

The particle swarm optimization algorithm was first improved by introducing population exchange operations and a tail elimination principle to enhance the algorithm’s search ability [11]. Then, fuzzy reward–penalty theory was introduced to plan manipulator trajectory accurately while effectively reducing computational complexity for the manipulator, which leads to improved optimization efficiency by minimizing redundant decoupling operations.

An improved adaptive cuckoo search algorithm was presented in paper [12] based on the continuous quintic B-spline curve algorithm for motion trajectory planning of manipulators, which can effectively generate time-optimal joint space smooth trajectories.

Although the papers mentioned above have made positive contributions to time-optimal trajectory planning for manipulators, they did not address the issue of the motion path of manipulators. Specifically, they only achieved time-optimal trajectory planning on a fixed path without considering whether that path was the optimal solution.

In contrast to the previous approaches, paper [13] introduced a method that not only plans the time-optimal trajectory for a manipulator but also searches for the optimal path, considering both aspects simultaneously. Given the start and final points, the intermediate path points and the running time are treated as variables, and the time-optimal trajectory is planned by means of an improved genetic algorithm. Although this method achieved shorter running times compared to other methods, it only utilized a single-objective genetic algorithm that treats running time as the objective function, without considering whether the path itself is the shortest.

This paper presents an improved method for time-optimal trajectory planning of the manipulator. Firstly, the quintic non-uniform rational B-spline (NURBS) curve is used to fit the manipulator trajectory passing through the path points. Then, to address the shortcomings of the traditional fast non-dominated sorting genetic algorithm II (NSGA-II), such as slow convergence speed and easy entrapment in local optimal values, the traditional NSGA-II is improved by using the reinforcement-learning algorithm to optimize the crossover and mutation probabilities and by introducing the variable neighborhood search (VNS) algorithm to enhance the capabilities of local search. The improved algorithm and other multi-objective optimization algorithms are tested on the test function to verify the reliability of the improved algorithm. Finally, the proposed approach utilizes an improved version of the NSGA-II to plan the time-optimal trajectory of the manipulator, taking into account both the path and time optimization simultaneously. The variables are designed as path points and their running time. The first objective function is designed as the total joint increment of the six joints of the manipulator along the whole trajectory, the second objective function is the total time of the current path, and a penalty function is introduced to limit the manipulator’s angular velocity, angular acceleration, and angular jerk. By utilizing the iterative improvement of NSGA-II, the proposed method generates the optimized trajectory for the manipulator, incorporating both path planning and time optimization simultaneously. This study conducts simulation verification on MATLAB using PUMA560 and demonstrates that the proposed method yields positive effects on the trajectory optimization of the manipulator compared with other trajectory planning methods.

The remaining sections of this paper are as follows. Section 2 introduces the commonly used PUMA560 manipulator and established its model. Section 3 discusses the NURBS interpolation and establishes the interpolation matrix. Section 4 presents the improved NSGA-II and compares it with other algorithms. Section 5 performs trajectory planning for the manipulator and compares it with other methods. The final section presents the conclusion of the paper.

2. The Manipulator Modelling

The PUMA560 is a six-jointed robot with three joints used for determining the end effector’s position, while the remaining three joints determine the wrist orientation. The axis of the first joint is vertical, while the axes of the second and third joints are horizontal and parallel, with a distance of a2 between them. The joint 1 and joint 2 axes intersect perpendicularly, while the axes of joints 3 and 4 intersect alternately at a distance of a3. The manipulator’s schematic diagram is presented in Figure 1, while Figure 2 shows the corresponding link coordinate system, and Table 1 displays the corresponding modified D-H data [14].

When solving for the manipulator with joint angles as variables, setting up a coordinate system for each joint of the manipulator is crucial. To construct the coordinate system for each joint, the various parameters of the manipulator listed in Table 1 are needed. The range of rotation for each joint is primarily used to eliminate irrelevant solutions when the manipulator has multiple solutions during the solution process [15]. Each link can be transformed by its transformation matrix using the coordinate system shown in Figure 2 and the data provided in Table 1, as demonstrated in Equation (1):

{}_{i}^{i - 1}T = [\begin{matrix} c θ_{i} & - s θ_{i} & 0 & α_{i - 1} \\ s θ_{i} c α_{i - 1} & c θ_{i} c α_{i - 1} & - s α_{i - 1} & - d_{i} s α_{i - 1} \\ s θ_{i} s α_{i - 1} & c θ_{i} s α_{i - 1} & c α_{i - 1} & d_{i} c α_{i - 1} \\ 0 & 0 & 0 & 1 \end{matrix}]

(1)

In Equation (1),

c θ

represents

\cos θ

, and

s θ

represents

\sin θ

. According to Equation (1), by substituting the data of PUMA560, the forward kinematics equation of PUMA560 can be obtained:

{}_{6}^{0}T = {}_{1}^{0}T (θ_{1}) {}_{2}^{1}T (θ_{2}) {}_{3}^{2}T (θ_{3}) {}_{4}^{3}T (θ_{4}) {}_{5}^{4}T (θ_{5}) {}_{6}^{5}T (θ_{6})

(2)

Simplifying Equation (2) yields the forward kinematics equation for PUMA560:

{}_{6}^{0}T = [\begin{matrix} a_{x} & b_{x} & c_{x} & d_{x} \\ a_{y} & b_{y} & c_{y} & d_{y} \\ a_{z} & b_{z} & c_{z} & d_{z} \\ 0 & 0 & 0 & 1 \end{matrix}]

(3)

The variables in the equation are defined as Equation (4):

\{\begin{cases} a_{x} = c_{1} [c_{23} (c_{4} c_{5} c_{6} - s_{4} s_{6}) - s_{23} s_{5} c_{6}] + s_{1} (s_{4} c_{5} c_{6} + c_{4} s_{6}) \\ a_{y} = s_{1} [c_{23} (c_{4} c_{5} c_{6} - s_{4} s_{6}) - s_{23} s_{5} c_{6}] - c_{1} (s_{4} c_{5} c_{6} + c_{4} s_{6}) \\ a_{z} = - s_{23} (c_{4} c_{5} c_{6} - s_{4} s_{6}) - c_{23} s_{5} c_{6} \\ b_{x} = c_{1} [c_{23} (- c_{4} c_{5} s_{6} - s_{4} c_{6}) + s_{23} s_{5} s_{6}] + s_{1} (c_{4} c_{6} - s_{4} c_{5} s_{6}) \\ b_{y} = s_{1} [c_{23} (- c_{4} c_{5} s_{6} - s_{4} c_{6}) + s_{23} s_{5} s_{6}] - c_{1} (c_{4} c_{6} - s_{4} c_{5} c_{6}) \\ b_{z} = - s_{23} (- c_{4} c_{5} s_{6} - s_{4} c_{6}) + c_{23} s_{5} s_{6} \\ c_{x} = - c_{1} (c_{23} c_{4} s_{5} + s_{23} c_{5}) - s_{1} s_{4} s_{5} \\ c_{y} = - s_{1} (c_{23} c_{4} s_{5} + s_{23} c_{5}) + c_{1} s_{4} s_{5} \\ c_{z} = s_{23} c_{4} s_{5} - c_{23} c_{5} \\ d_{x} = c_{1} [a_{2} c_{2} + a_{3} c_{23} - d_{4} s_{23}] - d_{2} s_{1} \\ d_{y} = s_{1} [a_{2} c_{2} + a_{3} c_{23} - d_{4} s_{23}] + d_{2} c_{1} \\ d_{z} = - a_{3} s_{23} - a_{2} s_{2} - d_{4} c_{23} \end{cases}

(4)

Equation (4) describes the position and orientation of coordinate system {6} relative to the base coordinate system {0}, where

c_{23} = \cos (θ_{2} + θ_{3})

and

s_{23} = \sin (θ_{2} + θ_{3})

.

3. The Quintic NURBS Interpolation

3.1. The Fundamental Formulas of NURBS

Non-uniform rational B-splines (NURBS) is a mathematical curve representation method that uses a small number of control points and weights to describe a curve. NURBS curves have good local support characteristics, as changing one control point only affects the adjacent control points rather than the entire curve [16]. Using NURBS functions with non-uniform distribution of knot vectors can effectively prevent abrupt changes. The NURBS curves are represented by piecewise rational polynomial vector functions as Equation (5):

P (u) = \frac{\sum_{i = 0}^{n} ω_{i} d_{i} N_{i, k} (u)}{\sum_{i = 0}^{n} ω_{i} N_{i, k} (u)}

(5)

where

d_{i}

represents the control points,

ω_{i}

represents the weight factors of the control points, and

N_{i}

is the basis function of the B-spline function [17], which can be expressed from the recursion formula as Equations (6) and (7):

N_{i, 0} (u) = \{\begin{cases} 1 \begin{matrix} u_{i} \leq u \leq u_{i + 1} \end{matrix} \\ 0 \begin{matrix} \begin{matrix} o t h e r s \end{matrix} \end{matrix} \end{cases}

(6)

N_{i, k} (u) = \frac{u - u_{i}}{u_{i + k} - u_{i}} N_{i, k - 1} (u) + \frac{u_{i, k + 1} - u}{u_{i + k + 1} - u_{i + 1}} N_{i + 1, k - 1} (u)

(7)

To obtain the trajectory passing through

m

path points. The NURBS curve requires the definition of control points

d_{i} (i = 0, 1, \dots, m)

and the knot vector

U = [u_{0}, u_{1},

\dots, u_{n + k}, u_{n + k + 1}]

. Regulating

u_{0} = u_{1} = \dots = u_{k} = 0

,

u_{n} = u_{n + 1} = \dots u_{n + k + 1} = 1

, and the other knot vectors are determined by the time intervals

h_{j}

between the path points using the chord-length parameterization method [18] where

n = m + k - 1

.

u_{i} = \frac{\sum_{j = 0}^{i - k - 1} h_{j}}{\sum_{j = 0}^{m - 1} h_{j}} (i = k + 1, k + 2, \dots, n)

(8)

3.2. Matrix Expression of Quintic NURBS Curve

Based on the derivative formula of k-th NURBS curves, the angular velocity, angular acceleration, and angular jerk of the manipulator at the moment

u

can be obtained by Equation (9):

P^{(k)} (u) = \frac{A^{} (u) - \sum_{i = 1}^{k} C_{k}^{i} ω^{} (u) P^{(k - i)} (u)}{ω (u)}

(9)

where

A (u) = \sum_{i = 0}^{n} N_{i, k}^{} (u) d_{i} ω_{i}

,

ω (u) = \sum_{i = 0}^{n} N_{i, k}^{} (u) ω_{i}

.

According to Equation (10):

Q_{m} = \sum_{i = 0}^{n + 2} N_{i, k} (u) d_{i} i = 0, 1, \dots, \begin{matrix} n \end{matrix} m = 2, 3, \dots, n - 1

(10)

To solve the NURBS curve with n + 1 unknowns, four endpoint constraints need to be added to make the system of equations have a unique solution [19]. Therefore, four constraint equations are introduced as Equation (11):

\{\begin{cases} d_{0} = Q_{0} \\ d_{n + 2} = Q_{n} \\ d_{0} = d_{1} \\ d_{n + 1} = d_{n + 2} \end{cases}

(11)

This constraint equation constrains the angular velocity and angular acceleration at the start and final positions of the manipulator. Substituting it into Equation (5) yields a matrix equation for solving all the control points.

[\begin{array}{l} M_{0} \\ M_{1} & \dots & 0 \\ M_{2} \\ ⋱ \\ M_{n - 2} \\ 0 & \dots & M_{n - 1} \\ M_{n} \end{array}] [\begin{matrix} d_{0} \\ d_{1} \\ d_{2} \\ ⋮ \\ d_{n - 2} \\ d_{n - 1} \\ d_{n} \end{matrix}] = [\begin{matrix} Q_{0} \\ Q_{1} \\ Q_{2} \\ ⋮ \\ Q_{n - 2} \\ Q_{n - 1} \\ Q_{n} \end{matrix}]

(12)

where

M_{i} = [N_{i, 5 (u)}, N_{i + 1, 5 (u)}, N_{i + 2, 5 (u)}, N_{i + 3, 5 (u)}, N_{i + 4, 5 (u)}]

,

i = 2, 3, \dots, n - 2

,

u

is the time corresponding to the

U_{i}

,

M_{1} = [\begin{matrix} - 1 & 1 & 0 & \dots & 0 & 0 & 0 \end{matrix}]

,

M_{n - 1} = [\begin{matrix} 0 & 0 & 0 & \dots \end{matrix}

\begin{matrix} 0 & - 1 & 1] \end{matrix}

,

M_{0} = [\begin{matrix} 1 & 0 & 0 & \dots & 0 & 0 & 0 \end{matrix}]

,

M_{n} = [\begin{matrix} 0 & 0 & 0 & \dots & 0 & 0 & 1 \end{matrix}]

,

Q_{1}

= Q_{n - 1} = 0

. The remaining points are the known numerical values.

The displacement equation of the manipulator can be obtained from Equation (5), and the angular velocity, angular acceleration, and angular jerk equations of each joint of the manipulator can be obtained from Equation (9).

4. The Improved NSGA-II

The NSGA-II optimization method, proposed by Strinivas and Deb in 2002, is an improved version of the NSGA. NSGA-II was developed based on the non-dominated sorting concept of Goldberg and is derived from the genetic algorithm [20]. To distribute the population into different non-dominated levels, NSGA adopts a slow-sorting method, and to maintain the diversity of the population, it utilizes the sharing function method [21]. Specifically, NSGA-II is a non-dominated sorting algorithm that utilizes a fast-sorting method and has a worst-case computational complexity that is relatively low. It has been proven in practice that NSGA-II has improved in terms of optimization effect and computation time compared to NSGA and is an excellent multi-objective optimization algorithm [22,23,24].

Some experts and scholars have also made improvements to NSGA-II. In paper [25], the K-means algorithm was used to divide each generation of parent populations into two categories, which were selected for genetic evolution through tournament selection, and then, the differential evolution (DE) algorithm was integrated to effectively improve the algorithm’s search capability. In paper [26], an adaptive NSGA-II(AONSGA-II) was proposed in which the crossover and mutation probabilities change with the iteration number and was combined with the 2-optimization algorithm, effectively improving the algorithm’s performance.

The genetic evolutionary behavior of NSGA-II mainly depends on the crossover and mutation operations. Due to the randomness of the probabilities of crossover and mutation, their crucial parameters cannot be adaptively optimized based on the current population state during the algorithm evolution process. Therefore, there are also drawbacks, such as in the context of solving large-scale multi-objective task scheduling problems, some common issues that arise include the risk of getting stuck in local optima and the potential for decreased computational efficiency, resulting in the efficiency and quality of the solution failing to meet the requirements. In previous algorithms, the probabilities of crossover and mutation are often determined through extensive experimentation or prior experience, which are not conducive to the preservation of excellent individuals in the population. During the population evolution iteration process, to maintain population diversity and preserve the individuals with higher fitness, the selection strategy should prioritize their preservation in the population while avoiding disrupting the population by excluding them from crossover and mutation. Individuals with relatively lower fitness should have a higher probability of participating in crossover and mutation so that the population can evolve towards a better direction.

Reinforcement learning focuses on the autonomous interaction process between the agent and the environment, which does not involve external interference. Reinforcement learning provides a new solution for the self-learning of evolutionary parameters.

4.1. Reinforcement-Learning Algorithm

Reinforcement learning is a trial-and-error learning method where an agent interacts with the environment and receives rewards that guide its behavior towards maximizing the total reward obtained over a sequence of actions. Reinforcement learning is different from supervised learning in connectionist learning in that the reinforcement signal in reinforcement learning is a scalar evaluation of the goodness of the actions taken provided by the environment rather than an instruction on how to produce the correct action. Since the external information provided by the environment is limited, the reinforcement-learning system (RLS) learns from its own experience by interacting with the environment and adjusting its actions based on the rewards received. In this way, the RLS acquires knowledge about the action-evaluation environment and adapts its strategies accordingly.

4.1.1. Design Q Table

SARSA and Q-learning are two main algorithms in reinforcement learning used to train the value function Q [27]. The Q-table is initialized as a matrix with a number of rows and columns equal to the number of states and actions, respectively, and all its entries are set to zero. As shown in Equation (13), the feedback on actions is updated based on the environment and state.

Q (s t a t e, a c t i o n) = \begin{matrix} \begin{matrix} a_{1} & a_{2} & \dots & a_{n} \end{matrix} \\ \begin{array}{l} S_{1} \\ S_{2} \\ ⋮ \\ S_{n} \end{array} & [\begin{array}{l} 0 & 0 & \dots & 0 \\ 0 & ⋱ & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 0 \end{array}] \end{matrix}

(13)

Q (s t a t e, a c t i o n) = (1 - α) Q (s t a t e, a c t i o n) + α (r + γ \max_{a^{'}} Q ({s t a t e}^{'}, {a c t i o n}^{'}))

(14)

Q (s t a t e, a c t i o n) = (1 - α) Q (s t a t e, a c t i o n) + α (r + γ Q ({s t a t e}^{'}, {a c t i o n}^{'}))

(15)

Q-learning is a model-free method that does not require previous state values and can be executed in a stochastic dynamic environment. It updates the Q-table based on the maximum expected reward using Equation (14). SARSA, on the other hand, is an improved version of Q-learning, which trains the action value function using Equation (15). Both algorithms have their own advantages, and the experimental results presented by Sutton and Barto [28] showed that SARSA has a faster convergence speed, while Q-learning achieves better performance in the end. Furthermore, paper [29] has shown that combining SARSA and Q-learning can improve the convergence performance of the single objective genetic algorithm. In this paper, we adopt the SARSA algorithm in the initial phase and switch to the Q-learning algorithm in the subsequent phase, as shown in Equation (16):

\{\begin{cases} S A R S A & g e n \leq g e n_{\max} / 2 \\ Q - l e a r n i n g & g e n > g e n_{\max} / 2 \end{cases}

(16)

4.1.2. Design States

The objective of using the reinforcement-learning algorithm is to find the optimal solution for the NSGA-II, while considering the diversity and balance of the optimization process. The design of the environment state for the algorithm should consider the fitness of the population and other factors, such as the crowding distance. In the algorithm, the construction of the environment state mainly considers the following aspects:

f (x_{i}^{g e n}) = w_{1} f_{1} (x^{g e n}) + w_{2} f_{2} (x^{g e n}) + \dots + w_{n} f_{n} (x^{g e n})

(17)

g = \frac{\sum_{i = 1}^{N} f (x_{i}^{g e n})}{\sum_{i = 1}^{N} f (x_{i}^{1})}

(18)

P {[N^{g e n}]}_{d i s} = \sum_{i = 2}^{N - 1} \sum_{k = 1}^{M} (P [i + 1] \cdot f_{k} (x^{g e n}) - P [i - 1] \cdot f_{k} (x^{g e n})) / (f_{k}^{\max} - f_{k}^{\min})

(19)

d = \frac{1}{P}

(20)

b a l a n c e = \sum_{j = 1}^{F} \min \sum_{k = 1}^{M} |f_{k} - f_{j, k}|

(21)

h = (\sum_{i = 1}^{N} b a l a n c e (i)) / N

(22)

S = c_{1} g + c_{2} d + c_{3} h

(23)

Among them, Equation (17) obtains the fitness value of an individual by weighting the objective function, where

g e n

represents the iteration number,

w_{i}

is the weight parameter that determines the specific value based on the requirements of each objective function,

f (x^{g e n})

represents the

g e n

-th generations linear weighted sum of the objective functions of individual

i

in the

g e n

-th population,

P {[N^{g e n}]}_{d i s}

is the crowding distance of the individual in the

g e n

-th generations population,

b a l a n c e

represents the balance degree of each individual,

N

is the number of populations,

M

is the number of objectives, and

F

represents the non-dominated layers. Equation (18) normalizes the average fitness of the population by the average fitness of the first generation to reflect the convergence of the population. Equation (19) represents the total crowding distance of the population in each generation. Equation (20) converts the crowding distance into a minimum value, where a smaller value represents a larger crowding distance, reflecting the diversity of the population. Equation (21) represents the balance degree of an individual, and a smaller balance degree indicates a more uniform distribution of the individual on each objective and a better multi-objective balance. Equation (22) represents the balance degree of the population. Equation (23) obtains the total state value of the population by weighting. The average fitness and diversity of the population are beneficial to improve the quality of the entire population and obtain excellent individuals so the three weights are designed as 0.4, 0.4, and 0.2.

4.1.3. Designing Actions

Actions refer to the responses made by the

a g e n t

to the current environmental state. For each generation of the population, the

a g e n t

takes different actions based on the environmental state, that is

p_{c}

and

p_{m}

. For

p_{c}

, the interval value between each action is set to 0.05, with the exploration rate ranging from 0.4 to 0.9 as an example; when 0.4 to 0.45 is selected, a random number is chosen within this range. The same method is also applied to

p_{m}

, which has a range of 0.01 to 0.21 and an interval value of 0.02 [29].

The action set for crossover probability is shown in Table 2, and the action set for mutation probability is shown in Table 3.

4.1.4. Designing Reward Functions

The

a g e n t

does not take actions by itself but rather continuously tries to determine which actions will result in higher rewards in order to steer the algorithm towards better directions. The crossover probability

p_{c}

is adjusted by comparing the overall states of each generation using Equation (24), while the mutation probability

p_{m}

is adjusted by comparing the crowding distances of each generation using Equation (25). As the goal of the objective function in this paper is to minimize, the reward is positive if the population in generation

g e n

-th is better than that in generation

g e n - 1

-th, and negative otherwise.

R_{c} = S^{g e n - 1} - S^{g e n}

(24)

R_{m} = d^{g e n - 1} - d^{g e n}

(25)

4.1.5. Movement Selection Strategy

At the beginning of the algorithm, the

a g e n t

has no learning experience to use, and all the data in the Q-table are zeros. Therefore, the

a g e n t

can only explore and learn through trial and error. By constantly exploring the unknown environment, the

a g e n t

accumulates more and more experience and uses this knowledge and experience to guide the selection of actions. A formula

ε - g r e e d y

that considers both exploration and exploitation is shown in Equation (26):

π (s t a t e, a c t i o n) = \{\begin{matrix} \max Q (s t a t e, a c t i o n) & ε \geq k_{0 - 1} \\ a c t i o n (R a n d) & ε < k_{0 - 1} \end{matrix}

(26)

In the formula,

ε

is the greedy rate, and

k_{0 - 1}

is a random number between 0 and 1. When the

a g e n t

selects an action, it can either choose the action that maximizes the Q value, which is called the greedy policy, or it can perform exploration and choose a random action.

4.2. VNS Algorithm

The variable neighborhood search (VNS) is a metaheuristic algorithm proposed by Mladenović and Hansen in 1997 [30]. The VNS algorithm expands the search space to obtain a local optimum by changing the neighborhood structure.

4.2.1. VNS Algorithm Concept

The basic idea of the VNS algorithm is to use multiple different neighborhood structures to systematically search the solution space, obtain a local optimal solution first, and then modify the neighborhood structure to explore a broader search space and discover another optimal solution. The algorithm first uses the smallest neighborhood structure for the search. If the obtained result cannot be further improved, the algorithm switches to a slightly larger neighborhood structure for the search. If the obtained result can still be improved, the algorithm returns to the smallest neighborhood structure for further search.

4.2.2. Neighborhood Design

Considering the optimization objective of this paper is the joint increments and total time of the manipulator; the time-optimal trajectory of the manipulator can be obtained by minimizing the joint increments while reducing the total time as much as possible. To achieve this goal, five neighborhoods are designed as follows:

Randomly select a displacement variable and subtract a certain value from it.
Randomly select two time variables and exchange them.
Randomly select a time variable and subtract a certain value from it.
Randomly select a displacement variable and multiply it by a random number.
Randomly select a time variable and multiply it by a random number.

4.3. Flow Chart of the Improved NSGA-II

Combining reinforcement learning with the NSGA-II requires considering rewards, states, actions, and selection policies. The state of NSGA-II is viewed as the environment state, and each iteration updates the state from

S_{g e n}

to

S_{g e n + 1}

. The learning module consists of a reinforcement-learning

a g e n t

and a reward function

R

. The overall execution process involves four steps. Firstly, the

a g e n t

acquires the environmental state

S_{g e n}

at

g e n

-th generation, and then selects an action

a_{i}

based on the action selection policy. The

a g e n t

adjusts

p_{c}

and

p_{m}

by executing the action, and NSGA-II uses the adjusted

p_{c}

and

p_{m}

for genetic operations. After completing the evolution operation, the environmental state is updated from

S_{g e n}

to

S_{g e n + 1}

. Finally, the reward function is used to calculate the reward and update the Q-table values for the corresponding state and action. After

g e n

times iterations, the action selection for

p_{c}

and

p_{m}

are optimized based on past learning experiences. The flowchart is shown in Figure 3.

4.4. Comparative Verification

This paper selects four test functions of multi-objective algorithms, including ZDT1, ZDT2, ZDT3, and ZDT6, to assess the effectiveness of the improved NSGA-II [31], which are shown in Table 4. These functions cover convex, concave, and discontinuous features, which can effectively evaluate the comprehensive performance of most multi-objective evolutionary algorithms. The performance of the proposed algorithm is compared with other multi-objective optimization algorithms, such as the multi-objective particle swarm optimization algorithm (MOPSO), the multi-objective evolutionary algorithm based on decomposition (MOEA/D), the strength pareto evolutionary algorithm 2(SPEA2), KNSGA-II [25], and AONSGA-II [26].

The standard NSGA-II is set population size

N = 100

, iteration times

g e n = 300

, crossover probability

p_{c} = 0.8

, and mutation probability

p_{m} = 0.1

. The improved NSGA-II is set with population size

N = 100

, iteration times

g e n = 300

, learning rate of

α = 0.06

, discount rate

γ = 0.85

, and greediness rate

ε = 0.6

.

To avoid making the article too lengthy, only the Pareto optimal frontiers of the traditional NSGA-II and the improved NSGA-II are shown in Figure 4 for each test function. Other optimization algorithms were only compared in terms of the mean inverted generational distance (IGD) and hypervolume (HV) values, as shown in Table 5 and Table 6.

The IGD and HV mean values obtained from running each function ten times are shown in Table 5 and Table 6. The bold portion represents the algorithms with the best results in terms of IGD and HV. To ensure fairness in the comparison, the same population size and number of iterations were used for the other multi-objective algorithms, and the remaining parameters were set to commonly used values.

According to the above results, although the MOEA/D algorithm has a lower average IGD value on the ZDT6 function, it does not demonstrate greater advantages on other test functions. KNSGA-II has higher average HV values, but its IGD results are not very good. Through comparison, it can be found that the proposed improved NSGA-II has better comprehensive performance and better search capability than other optimization algorithms, providing a foundation for the subsequent trajectory planning and enabling the search for better trajectories.

5. Simulation Verification

5.1. Model Building

The proposed method’s feasibility was demonstrated by implementing it on a PUMA560 manipulator model in MATLAB, utilizing the data presented in Table 1. The model of the PUMA560 manipulator is shown in Figure 5:

Considering the working performance and limits of each joint of the manipulator, the kinematic constraints are given in the Table 7:

To fulfill the kinematic constraints of the manipulator, penalty functions were introduced on each objective function to ensure that the motion trajectory meets the kinematic constraints. The fitness function is designed as shown in Equation (27):

\begin{array}{l} f (1) = (\sum_{i = 1}^{6} \sum_{j = 1}^{u} |P_{i, j} - P_{i, j + 1}|) \times E_{a} + E_{b} \\ f (2) = (\sum_{i = 1}^{9} t_{i}) * E_{a} + E_{c} \end{array}

(27)

In the Equation (27),

u

represents the number of control points, and

f (1)

represents the joint increment of the six joints of the manipulator on the current motion trajectory,

P_{i, j}

represents the joint increment of the th joint based on the th control point of the quintic NURBS curve,

f (2)

represents the total running time of the manipulator on the current trajectory, and

E_{a}

,

E_{b}

, and

E_{c}

are penalty functions for each fitness function. When each joint exceeds the kinematic constraint, the penalty function is activated:

\begin{array}{l} E_{a} = & \sum_{i = 1}^{6} (\min P_{i} / p_{i, \min} + \max P_{i} / p_{i, \max} + \max |P'_{i}| / V_{i, \max} \\ + \max |{P^{″}}_{i}| / a_{i, \max} + \max |P^{(3)}_{i}| / J_{i, \max}) \end{array}

(28)

The data of

p_{\min}

,

p_{\max}

,

V_{\max}

,

a_{\max}

and

J_{\max}

from Table 1 and Table 7,

E_{b}

and

E_{c}

, are added once to the adaptation thresholds of 1000° and 5 s, respectively, otherwise

E_{a} = 1

,

E_{b} = E_{c} = 0

.

5.2. Experimental Simulation

According to the parameters of PUMA560 in Table 1, the start and final points of the manipulator are designed as shown in Equation (29):

\begin{array}{l} q_{s t a r t} = [10, - 10, - 30, - 25, 20, 0] (°) \\ q_{f i n a l} = [- 20, 25, 90, 20, - 60, 120] (°) \end{array}

(29)

The population size

N = 100

, the number of iterations

g e n = 300

, the learning rate

α = 0.06

, the discount rate

γ = 0.85

, and the greediness rate

ε = 0.06

. The improved algorithm was used to solve for the time-optimal trajectory of the PUMA560 manipulator, and the path points and time are shown in Table 8. Based on the data in Table 8, the angular displacement, angular velocity, angular acceleration, and angular jerk of the manipulator are plotted in Figure 6.

As shown in Table 8 and Figure 6, the optimal time for the manipulator is 5.899 s, and the resulting angular displacement, angular velocity, angular acceleration, and angular jerk curves are smooth and continuous and all within the constraint range.

5.3. Comparative Experiment

Through comparative experiments, it has been verified that the proposed method in this paper can achieve shorter robot-arm-running time compared to other methods. The comparative experiments all use the quintic NURBS curves to fit the path points of the manipulator, with the difference being that a single-objective genetic algorithm is used to find the optimal time based on both uniform and random paths. Specifically, uniform path refers to a path where the points between the start and final points increase or decrease uniformly. Random path, on the other hand, consists of arbitrary points that satisfy the constraints given in Table 7. The experimental results are shown in Table 9:

According to the data in Table 9, using the PUMA560 model generated from Figure 5, three trajectory planning images were plotted, as shown in Figure 7. The angular displacement, angular velocity, angular acceleration, and angular jerk curves for the random path are plotted in Figure 8. Similarly, the curves for the uniform path are plotted in Figure 9.

According to the data in Table 9, it can be observed that the new time-optimal trajectory planning method proposed in this paper reduces the total running time by 46.23% compared to the traditional method of random path (or fixed path) and by 19.26% compared to the uniform path method. This effectively improves the efficiency of the manipulator. The four images in Figure 6 indicate that the obtained curves satisfy the kinematic constraints in Table 7, and the curves are uniform and continuous, demonstrating the effectiveness of the generated trajectory. By comparing the three trajectory planning methods in Figure 7, it can be observed that the proposed method passes through the start and final points, does not encounter singular points, and has a similar running trajectory to the uniform path method (which has a shorter running trajectory), demonstrating the effectiveness and rationality of the proposed method. In conclusion, the new time-optimal trajectory planning method proposed in this paper has a positive significance compared to conventional methods.

To verify the search capability of the improved NSGA-II, trajectory planning for the manipulator was performed using NSGA-II, MOPSO, MOEA/D, SPEA2, KNSGA-II [25] and AONSGA-II [26]. The resulting running times for the manipulator optimization using different algorithms are shown in Table 10. Additionally, trajectory planning curve graphs were plotted for each algorithm, as shown in Figure 10, Figure 11, Figure 12 and Figure 13.

Based on the data in Table 10 and the trajectory curve figures in Figure 10, Figure 11, Figure 12 and Figure 13, all algorithms satisfy the kinematic constraints in Table 7, validating the constraining effect of the penalty function on the results. By comparing the results, it can be observed that the traditional NSGA-II achieves shorter time durations compared to other optimization algorithms, demonstrating that NSGA-II is more suitable for solving the problem presented in this paper. KNSGA-II and AONSGA-II exhibit better search capabilities than the traditional NSGA-II, resulting in shorter time durations. However, when comparing with the improved NSGA-II proposed in this paper, the total runtime is optimized to 5.899 s, which is 38.4% and 54.4% shorter than KNSGA-II (9.591 s) and AONSGA-II (12.939 s), respectively. This further confirms that the improved NSGA-II possesses superior search capabilities.

6. Conclusions

The main work of this paper is to use an improved NSGA-II to simultaneously optimize the path point and total time of the manipulator to obtain the time-optimal trajectory. However, it cannot guarantee that the obtained trajectory is the shortest path of the manipulator.

Firstly, a uniformly and continuously moving trajectory of the manipulator is obtained by using quintic NURBS curve interpolation. Then, the reinforcement-learning algorithm and VNS algorithm are introduced to enhance the search capability of the algorithm. Subsequently, the improved NSGA-II is used to optimize the joint increments and running time of the manipulator arm. Finally, the correctness of the results is verified using the PUMA560 manipulator in MATLAB and compared with other optimization methods. The results show that the proposed method can achieve shorter running time.

The main contributions of this paper are the improvement of NSGA-II and a new method of trajectory planning for manipulator on shorter running time, which seeks shorter total running time while minimizing the joint increments. Based on comparative experiments, the improved NSGA-II has better overall performance compared to other optimization algorithms. Moreover, the new time-optimal trajectory planning method proposed in this paper effectively reduces the total runtime and improves the working efficiency of the manipulator compared to traditional methods. While this paper proposes an effective trajectory planning method, there are still some limitations to be addressed. Specifically, the method does not ensure finding the global optimal path and may not be able to guarantee the optimal running time of the trajectory, both of which are highly dependent on the search capability of the optimization algorithm. Therefore, for future work, it is crucial to improve the algorithm to have better search capabilities and design more suitable fitness functions to ensure that the path obtained is the shortest and the time obtained is the optimal time.

Author Contributions

Conceptualization, J.D.; methodology, J.H. and J.D.; software, J.H. and Z.C.; project administration, Z.C.; data curation, J.H.; resources, J.D. and Z.C.; visualization, J.H. and J.D.; writing—original draft, J.H. and J.D.; writing—review and editing, J.H. and J.D.; funding acquisition, J.D. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 51475317) and key R&D projects of Shanxi province (Grant No. 202102150401009).

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Acknowledgments

The authors would like to express their gratitude to Shanxi Pingyang Industry Machinery Co., Ltd. for their financial support of this research. In particular, we would like to thank Yunfei Yan for their contribution to this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Khulaidi, R.; Akmeliawati, R.; Grainger, S.; Lu, T.F. Structural Optimisation and Design of a Cable-Driven Hyper-Redundant Manipulator for Confined Semi-Structured Environments. Sensors 2022, 22, 8632. [Google Scholar] [CrossRef]
Liu, L.Z.; Zhang, L.Y.; Wang, Y.M.; Hou, Y.L. A novel robust fixed-time fault-tolerant tracking control of uncertain robot manipulators. IET Control Theory Appl. 2021, 15, 195–208. [Google Scholar] [CrossRef]
Ali, K.; Ullah, S.; Mehmood, A.; Mostafa, H.; Marey, M.; Iqbal, J. Adaptive FIT-SMC Approach for an Anthropomorphic Manipulator With Robust Exact Differentiator and Neural Network-Based Friction Compensation. IEEE Access 2022, 10, 3378–3389. [Google Scholar] [CrossRef]
Petrone, V.; Ferrentino, E.; Chiacchio, P. Time-Optimal Trajectory Planning With Interaction With the Environment. IEEE Robot. Autom. Lett. 2022, 7, 10399–10405. [Google Scholar] [CrossRef]
Luo, L.P.; Yuan, C.; Yan, R.J.; Yuan, Q.; Wu, J.; Shin, K.S.; Han, C.S. Trajectory planning for energy minimization of industry robotic manipulators using the Lagrange interpolation method. Int. J. Precis. Eng. Manuf. 2015, 16, 911–917. [Google Scholar] [CrossRef]
Shrivastava, A.; Dalla, V.K. Jerk Optimized Motion Planning of Redundant Space Robot Based on Grey-Wolf Optimization Approach. Arab. J. Sci. Eng. 2023, 48, 2687–2699. [Google Scholar] [CrossRef]
Zhang, X.F.; Shi, G.L. Multi-objective optimal trajectory planning for manipulators in the presence of obstacles. Robotica 2022, 40, 888–906. [Google Scholar] [CrossRef]
Nadir, B.; Mohammed, O.; Minh-Tuan, N.; Abderrezak, S. Optimal trajectory generation method to find a smooth robot joint trajectory based on multiquadric radial basis functions. Int. J. Adv. Manuf. Technol. 2022, 120, 297–312. [Google Scholar] [CrossRef]
Liu, Y.; Guo, C.; Weng, Y.P. Online Time-Optimal Trajectory Planning for Robotic Manipulators Using Adaptive Elite Genetic Algorithm With Singularity Avoidance. IEEE Access 2019, 7, 146301–146308. [Google Scholar] [CrossRef]
Gao, X.S.; Mu, Y.; Gao, Y.Z. Optimal trajectory planning for robotic manipulators using improved teaching-learning-based optimization algorithm. Ind. Robot-Int. J. Robot. Res. Appl. 2016, 43, 308–316. [Google Scholar] [CrossRef]
Lv, X.Y.; Yu, Z.X.; Liu, M.Y.; Zhang, G.Y.; Zhang, L. Direct Trajectory Planning Method Based on IEPSO and Fuzzy Rewards and Punishment Theory for Multi-Degree-of Freedom Manipulators. IEEE Access 2019, 7, 20452–20461. [Google Scholar] [CrossRef]
Zhang, L.H.; Wang, Y.; Zhao, X.Y.; Zhao, P.; He, L.G. Time-optimal trajectory planning of serial manipulator based on adaptive cuckoo search algorithm. J. Mech. Sci. Technol. 2021, 35, 3171–3181. [Google Scholar] [CrossRef]
Yu, X.L.; Dong, M.S.; Yin, W.M. Time-optimal trajectory planning of manipulator with simultaneously searching the optimal path. Comput. Commun. 2022, 181, 446–453. [Google Scholar] [CrossRef]
Cai, Z.X.; Xie, B. Fundamentals of Robotics; Tsinghua University Press: Beijing, China, 2000; pp. 52–54. [Google Scholar]
Xiong, Y.L.; Li, W.L.; Chen, W.B. Robotics: Modeling, Control, and Vision; Huazhong University of Science and Technology Press: Wuhan, China, 2020; pp. 100–102. [Google Scholar]
Wang, Z.S.; Li, Y.B.A.; Sun, P.; Luo, Y.Q.; Chen, B.; Zhu, W.T. A multi-objective approach for the trajectory planning of a 7-DOF serial-parallel hybrid humanoid arm. Mech. Mach. Theory 2021, 165, 104423. [Google Scholar] [CrossRef]
Chen, C.S.; Chen, S.K. Synchronization of tool tip trajectory and attitude based on the surface characteristics of workpiece for 6-DOF robot manipulator. Robot. Comput. Integr. Manuf. 2019, 59, 13–27. [Google Scholar] [CrossRef]
De Boor, C.; De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978; Volume 27. [Google Scholar]
Chen, D.; Li, S.Q.; Wang, J.F.; Feng, Y.; Liu, Y. A multi-objective trajectory planning method based on the improved immune clonal selection algorithm. Robot. Comput. Integr. Manuf. 2019, 59, 431–442. [Google Scholar] [CrossRef]
Srinivas, N.; Kalyanmoy, D. Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evol. Comput. 1994, 2, 221–248. [Google Scholar] [CrossRef]
Wu, X.G.; Wang, L.; Chen, B.; Feng, Z.B.; Qin, Y.W.; Liu, Q.; Liu, Y. Multi-objective optimization of shield construction parameters based on random forests and NSGA-II. Adv. Eng. Inform. 2022, 54, 101751. [Google Scholar] [CrossRef]
Wang, Y.J.; Wang, G.G.; Tian, F.M.; Gong, D.W.; Pedrycz, W. Solving energy-efficient fuzzy hybrid flow-shop scheduling problem at a variable machine speed using an extended NSGA-II. Eng. Appl. Artif. Intell. 2023, 121, 105977. [Google Scholar] [CrossRef]
Lu, Z.M.; Gao, Y.; Xu, C.B.; Li, Y.T. Configuration optimization of an off-grid multi-energy microgrid based on modified NSGA-II and order relation-TODIM considering uncertainties of renewable energy and load. J. Clean. Prod. 2023, 383, 135312. [Google Scholar] [CrossRef]
Ahmed, F.; Zhu, S.M.; Yu, G.Y.; Luo, E.C. A potent numerical model coupled with multi-objective NSGA-II algorithm for the optimal design of Stirling engine. Energy 2022, 247, 123468. [Google Scholar] [CrossRef]
Zhang, Y.A.; DU, Y.F.; Mao, E.R.; Meng, Q.F.; Zhu, Z.X. Multi-objective Optimization of High-horsepower Tractor Gear Box Based on Improved NSGA-Ⅱ. Trans. Chin. Soc. Agric. Mach. 2022, 53, 310–319. [Google Scholar]
Xu, Y.; Guo, Q.; Tan, A.; Xu, L.; Tu, Y.; Liu, S. Multi-objective Route Planning of Museum Guide based on an Improved NSGA-II Algorithm. J. Phys. Conf. Ser. 2021, 1828, 012051. [Google Scholar] [CrossRef]
Hsieh, Y.Z.; Su, M.C. A Q-learning-based swarm optimization algorithm for economic dispatch problem. Neural Comput. Appl. 2016, 27, 2333–2350. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Chen, R.H.; Yang, B.; Li, S.; Wang, S.L. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
Victor, G.H.; Tiago, P.F.; Luciano, F.; Guilherme, K. Application of the VNS heuristic for feature selection in credit scoring problems. Mach. Learn. Appl. 2022, 9, 100349. [Google Scholar]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]

Figure 1. PUMA560 schematic diagram.

Figure 2. Coordinate system of PUMA560.

Figure 3. Flow chart of the improved NSGA-II.

Figure 4. Comparison chart of the Pareto optimal frontier of the test function: (a) ZDT1 Pareto optimal frontier comparison chart; (b) ZDT2 Pareto optimal frontier comparison chart; (c) ZDT3 Pareto optimal frontier comparison chart; (d) ZDT6 Pareto optimal frontier comparison chart.

Figure 5. The PUMA560 model.

Figure 6. Trajectory planning curves charts: (a) angular displacement chart; (b) angular velocity chart; (c) angular acceleration chart; (d) angular jerk chart.

Figure 7. The figure of three trajectory planning.

Figure 8. Random path trajectory planning curves charts: (a) angular displacement chart; (b) angular velocity chart; (c) angular acceleration chart; (d) angular jerk chart.

Figure 9. Uniform path trajectory planning curves charts: (a) angular displacement chart; (b) angular velocity chart; (c) angular acceleration chart; (d) angular jerk chart.

Figure 10. Angular displacement curve figures of different algorithms: (a) NSGA-II; (b) MOPSO; (c) MOEA/D; (d) SPEA2; (e) KNSGA-II; (f) AONSGA-II.

Figure 11. Angular velocity curve figures of different algorithms: (a) NSGA-II; (b) MOPSO; (c) MOEA/D; (d) SPEA2; (e) KNSGA-II; (f) AONSGA-II.

Figure 12. Angular acceleration curve figures of different algorithms: (a) NSGA-II; (b) MOPSO; (c) MOEA/D; (d) SPEA2; (e) KNSGA-II; (f) AONSGA-II.

Figure 13. Angular jerk curve figures of different algorithms: (a) NSGA-II; (b) MOPSO; (c) MOEA/D; (d) SPEA2; (e) KNSGA-II; (f) AONSGA-II.

Table 1. Modified PUMA560 D-H Parameters.

Joint	θ_i/(°)	α_i₋₁/(°)	α_i₋₁/mm	d_i/mm	Variable Range (°)
1	90	0	0	0	−160~160
2	0	−90	0	149.09	−225~45
3	−90	0	431.8	0	−45~225
4	0	−90	20.32	433.07	−110~170
5	0	90	0	0	−100~100
6	0	−90	0	0	−266~266

Table 2. p_c Action Value Table.

Actions	p_c Ranges	Actions	p_c Ranges
$a_{1}$	$[0.40, 0.45)$	$a_{6}$	$[0.65, 0.70)$
$a_{2}$	$[0.45, 0.50)$	$a_{6}$	$[0.70, 0.75)$
$a_{3}$	$[0.50, 0.55)$	$a_{7}$	$[0.75, 0.80)$
$a_{4}$	$[0.55, 0.60)$	$a_{9}$	$[0.80, 0.85)$
$a_{5}$	$[0.60, 0.65)$	$a_{10}$	$[0.85, 0.90)$

Table 3. p_m Action Value Table.

Actions	p_m Ranges	Actions	p_m Ranges
$a_{1}$	$[0.01, 0.03)$	$a_{6}$	$[0.11, 0.13)$
$a_{2}$	$[0.03, 0.05)$	$a_{6}$	$[0.13, 0.15)$
$a_{3}$	$[0.05, 0.07)$	$a_{7}$	$[0.15, 0.17)$
$a_{4}$	$[0.07, 0.09)$	$a_{9}$	$[0.17, 0.19)$
$a_{5}$	$[0.09, 0.11)$	$a_{10}$	$[0.19, 0.21)$

Table 4. Representations of four test functions.

Test Function	Representation	Variable Range
ZDT1	$\begin{array}{l} f_{1} (x) = x_{1} \\ f_{2} (x) = h (1 - \sqrt{f_{1} / h}) \\ h (x) = 1 + 9 \sum_{i = 2}^{n} x_{i} / (n - 1) \end{array}$	$\begin{array}{l} n = 30 \\ 0 \leq x_{i} \leq 1, (i = 1, 2, \dots, n) \end{array}$
ZDT2	$\begin{array}{l} f_{1} (x) = x_{1} \\ f_{2} (x) = h (1 - {(f_{1} / h)}^{2}) \\ h (x) = 1 + 9 \sum_{i = 2}^{n} x_{i} / (n - 1) \end{array}$	$\begin{array}{l} n = 30 \\ 0 \leq x_{i} \leq 1, (i = 1, 2, \dots, n) \end{array}$
ZDT3	$\begin{array}{l} f_{1} (x) = x_{1} \\ f_{2} (x) = h (1 - \sqrt{f_{1} / h} - (f_{1} / h) \sin (10 π f_{1})) \\ h (x) = 1 + 9 \sum_{i = 2}^{n} x_{i} / (n - 1) \end{array}$	$\begin{array}{l} n = 30 \\ 0 \leq x_{i} \leq 1, (i = 1, 2, \dots, n) \end{array}$
ZDT6	$\begin{array}{l} f_{1} (x) = 1 - \exp (- 4 x_{1}) \sin^{6} (6 π x_{1}) \\ f_{2} (x) = h (1 - {(f_{1} / h)}^{2}) \\ h (x) = 1 + 9 {(\sum_{i = 2}^{n} x_{i} / (n - 1))}^{0.25} \end{array}$	$\begin{array}{l} n = 10 \\ 0 \leq x_{i} \leq 1, (i = 1, 2, \dots, n) \end{array}$

Table 5. Comparison of IGD Mean Values.

Algorithm	Test Function
Algorithm	ZDT1	ZDT2	ZDT3	ZDT6
Traditional NSGA-II	0.1533	0.2182	0.1653	0.0649
MOPSO	0.0188	0.5849	0.0438	0.1326
MOEA/D	0.0077	0.0649	0.0563	0.0033
SPEA2	0.0893	0.1069	0.0982	0.0231
KNSGA-II	0.0056	0.0796	0.0264	0.0470
AONSGA-II	0.1047	0.1606	0.1192	0.0482
Improved NSGA-II	0.0055	0.0055	0.0066	0.0044

Table 6. Comparison of HV Mean Values.

Algorithm	Test Function
Algorithm	ZDT1	ZDT2	ZDT3	ZDT6
Traditional NSGA-II	0.5318	0.2209	0.5274	0.3330
MOPSO	0.6992	0.1534	0.5930	0.3062
MOEA/D	0.7126	0.4083	0.5597	0.3885
SPEA2	0.6027	0.3009	0.5564	0.3590
KNSGA-II	0.7175	0.5189	0.6375	0.4400
AONSGA-II	0.5908	0.2471	0.5744	0.3460
Improved NSGA-II	0.7181	0.4427	0.5986	0.3875

Table 7. PUMA560 kinematic constraints.

Joint	1	2	3	4	5	6
Angular velocity V/(°/s)	100	95	100	150	130	110
Angular acceleration A (°/s²)	45	40	75	70	90	80
Angular jerk J (°/s³)	60	60	55	70	75	70

Table 8. Time-optimal path and time.

Path Points	Joint						t_i (s)
Path Points	1	2	3	4	5	6	t_i (s)
1	10.000	−10.000	−30.000	−25.000	20.000	0.000	0.000
2	6.675	−6.139	−16.259	−19.999	11.178	13.371	1.325
3	3.389	−2.235	−3.317	−15.010	2.339	26.820	1.802
4	0.024	1.507	9.819	−9.995	−6.719	39.981	2.271
5	−3.298	5.546	23.202	−5.010	−15.416	53.244	2.720
6	−6.481	9.328	36.681	0.334	−23.942	66.927	3.171
7	−9.894	13.365	49.564	5.021	−33.380	80.028	3.645
8	−13.229	17.044	63.336	9.922	−42.085	93.406	4.127
9	−16.659	21.081	76.673	15.034	−51.122	106.819	4.598
10	−20.000	25.000	90.000	20.000	−60.000	120.000	5.899

Table 9. Comparison of experimental results.

t_i	Random Path	Uniform Path	Method In Ihis Paper
1	0.000	0.000	0.000
2	1.755	1.766	1.325
3	2.534	2.316	1.802
4	3.709	2.825	2.271
5	5.112	3.441	2.720
6	6.396	4.093	3.171
7	7.336	4.656	3.645
8	8.396	5.224	4.127
9	9.121	5.811	4.598
10	10.970	7.306	5.899

Table 10. Comparison of different algorithms.

t_i (s)	NSGA-II	MOPSO	MOEA/D	SPEA2	KNSGA-II	AONSGA-II
t₁	0.000	0.000	0.000	0.000	0.000	0.000
t₂	3.077	2.748	4.927	3.814	2.672	1.902
t₃	4.810	6.543	8.132	6.206	3.257	2.864
t₄	6.345	8.778	10.656	11.206	3.961	3.960
t₅	8.642	11.825	12.241	16.026	4.623	5.212
t₆	11.315	14.781	14.615	20.195	5.003	6.312
t₇	13.978	19.323	16.819	22.908	5.707	7.876
t₈	15.571	22.165	18.732	26.877	6.548	8.582
t₉	16.761	26.079	21.035	29.912	7.145	10.257
t₁₀	19.600	28.842	23.712	34.254	9.591	12.939

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hou, J.; Du, J.; Chen, Z. Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II. Appl. Sci. 2023, 13, 6757. https://doi.org/10.3390/app13116757

AMA Style

Hou J, Du J, Chen Z. Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II. Applied Sciences. 2023; 13(11):6757. https://doi.org/10.3390/app13116757

Chicago/Turabian Style

Hou, Jie, Juan Du, and Zhi Chen. 2023. "Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II" Applied Sciences 13, no. 11: 6757. https://doi.org/10.3390/app13116757

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Time-Optimal Trajectory Planning for the Manipulator Based on Improved Non-Dominated Sorting Genetic Algorithm II

Abstract

1. Introduction

2. The Manipulator Modelling

3. The Quintic NURBS Interpolation

3.1. The Fundamental Formulas of NURBS

3.2. Matrix Expression of Quintic NURBS Curve

4. The Improved NSGA-II

4.1. Reinforcement-Learning Algorithm

4.1.1. Design Q Table

4.1.2. Design States

4.1.3. Designing Actions

4.1.4. Designing Reward Functions

4.1.5. Movement Selection Strategy

4.2. VNS Algorithm

4.2.1. VNS Algorithm Concept

4.2.2. Neighborhood Design

4.3. Flow Chart of the Improved NSGA-II

4.4. Comparative Verification

5. Simulation Verification

5.1. Model Building

5.2. Experimental Simulation

5.3. Comparative Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI