Research on deep-sea mining vehicles (DSMVs) is being actively conducted for deep-sea resource development. Developing the obstacle avoidance performance of DSMV will effectively improve the mobility and safety of the DSMV operation process. In this paper, the Modified Twin Delayed Deep Deterministic Policy Gradient (MTD3) algorithm is adapted to train a controller for obstacle avoidance. A Markov decision process model including state, action, and reward functions is designed, and a suitable reinforcement-learning training environment is designed. The obstacle-avoidance ability of the controller is verified by the simulation tests of avoiding the randomly generated one to five obstacles in each episode.


The ocean has the largest untapped mineral resources on the earth. In general, there are three main types of deposits of deep-sea mineral resources on the sea floor (Leng et al., 2021): polymetallic nodules, massive Sulfides(SMS), and cobalt-rich crusts. Once these rich mineral resources can be exploited effectively and economically, it will alleviate the shortage of mineral resources to a great extent. Deep-sea mining can be defined as the utilization of hydrodynamic or mechanical methods to transport mineral ores from the seabed to the ocean surface and then transport ores to land-based processing plants by ships(Ma et al., 2019; Ma et al., 2022). As the key equipment of deep-sea ore mining in the future, deep-sea mining vehicle (DSMV) is the carrier of mining equipment. DSMV carries mining devices to collect deep-sea ore as much as possible. However, the complexity and unpredictability of the seabed topography pose great challenges to the movements of DSMV. Therefore, it is necessary to develop an optimal controller for collision avoidance of DSMV.

Traditional approaches to obstacle avoidance of DSMV rely on model-based planning(Dai and Liu, 2013; Liang et al., 2018). For example, Li and Zou (2012) proposed a fuzzy PID approach to control the speed of the left and right tracks to simulate the unilateral obstacle-crossing condition when the vehicle drives in a straight line. Numerical simulation results preliminarily show the feasibility of the proposed methods. In addition to using fuzzy logic, Wu et al. (2021) proposed a model predictive control (MPC) method to handle dead zones and obstacles during trajectory tracking. An obstacle avoidance strategy is used that utilizes the tri-circular arc obstacle-avoidance trajectory with an equal curvature for path re-planning. However, these works are subject to model mismatches, nonlinearity, and external disturbances in dynamic environments.

This content is only available via PDF.
You can access this article if you purchase or spend a download.