An Ultra Fast Semantic Segmentation Model for AMR’s Path Planning

— Computer vision plays a significant role in mobile robot navigation due to the abundance of information extracted from digital images. On the basis of the captured images, mobile robots determine their location and proceed to the desired destination. Obstacle avoidance still requires a complex sensor system with a high computational efficiency requirement due to the complexity of the environment. This research provides a real-time solution to the issue of extracting corridor scenes from a single image. Using an ultra-fast semantic segmentation model to reduce the number of training parameters and the cost of computation. In addition, the mean Intersection over Union (mIoU) is 89%, and the high accuracy is 95%. To demonstrate the viability of the prosed method, the simulation results are contrasted to those of contemporary techniques. Finally, the authors employ the segmented image to construct the frontal view of the mobile robot in order to determine the available free areas for mobile robot path planning tasks.


INTRODUCTION
Mobile robots safely navigate their surroundings by identifying obstacles and moving objects in real time [1].The sensor system in conjunction with the obstacle detection navigation algorithm consists of a laser scanner, sensor, and camera [2].The use of lidar or digital cameras to provide information about a moving environment has become widespread in recent years [3,4].Despite the high cost and large number of computational steps [5,6], it is still possible to perform the computations.Lidars provide depth information in all directions, preserving a world of perfect approximation.In addition, the camera provides inexpensive scene data for subject detection [7][8][9][10][11][12][13][14].Due to the widespread availability of inexpensive, high-precision single cameras, the disadvantages that previously existed have been eradicated.Various forms of real-time image segmentation modeling and automatic navigation have been implemented successfully based on computer vision environment recognition techniques.
Vision-based indoor mobile robot navigation has gained popularity as a sensing method for autonomous navigation due to its ability to provide detailed information about the environment [15][16][17][18] that may not be obtainable via combinations of other types of sensors.Semantic segmentation is a computer vision technique that involves the partitioning of an image into different regions or segments based on their semantic meaning.These partitions would help understanding the area around to improve the movement planning purposes.Semantic segmentation via deep learning (DL) is now a crucial task in computer vision, with applications including scene understanding, robotic perception, and image compression [19][20][21][22][23][24][25][26].Semantic segmentation will precisely define semantic classifications such as buildings, transportation infrastructure, trees, and low vegetation as imaging technology advances [10][11][12][13][14]. Minae et al., [27] examined the interrelationships, advantages, and difficulties of these DL-based segmentation models.By contrasting the datasets, Li et al., [28] provided the essential methods of semantic segmentation for the fundamental datasets used in various structures.To remedy the lack of standard datasets for evaluating object segmentation.The 2D semantic labeling competition is recommended as a solution to this issue in [29].In [30], Fusic et al. presented a DL algorithm for scene terrain categorization based on visual sensors.Even though the process obtains a high processing speed, its precision remains quite low when environmental factors change.Shelhamer et al., [31] converted modern classifier networks (AlexNet, VGG, and GoogLeNet) into fully convolutional networks (FCNs) to improve segmentation model performance and accuracy.Wang et al., used VGG-FCN models such as [30] or Unet [32] to address the image segmentation problem yields accurate results, but is unsuitable for infrastructure deployment due to their high computational cost.Rusli et al., [33] navigated mobile robot using the Canny edge algorithm and Hough line transform.There, boundary markers detected road markings and obstructions.
Multiple binary masks could be combined as a result of segmentation to segment the input image into distinct classes.In addition, multi-class semantic segmentation achieved remarkable results for mobile robot path planning in environments with numerous obstacles and complex topologies [30][31][32][33].Analyzed image datasets, the proposed classification algorithm distinguishes between terrain and obstacles [30].However, scene terrain classification-based mobile robot navigation had not been demonstrated.In general, the semantic segmentation and the path planning are done with high-power computing devices, which limit the application to big robots or to purely computer simulations.In this paper, to conserve memory resources and guarantee processing speed, we segmented available and unavailable regions using binary semantics, which make the segmentation model's architecture is now more streamlined and rapid, suitable for small computer boards, with quality assured.The smaller size of computing unit would also help Journal of Robotics and Control (JRC) ISSN: 2715-5072 425 Hoai-Linh Tran, An Ultra Fast Semantic Segmentation Model for AMR's Path Planning to reduce the overall dimensions of the robot, or to leave more loading capability for other tasks given to the robot.
Therefore, the authors devise a segmentation model using FCN as the decoder and MobilenetV2 as the encoder in order to solve the semantic segmentation problem [34,35] and achieve efficient scene comprehension for autonomous driving.This combination will allow to perform the required tasks using only embedded computers while still achieving real-time performance quality.For the encoder, the authors employ a previously trained model from precursor networks.The authors then generate fractional predictions in the decoder block using multiscale fusion.The photography method captures authentic images of an indoor environment.Using published image data to compare with published segmentation methods, the proposed semantic segmentation model obtains an overall precision of 95%.In addition, the new image dataset from the Ducktown maintains its efficacy.The authors then effectively apply perspective correction to the segmented image to construct a frontal view of the general area, which detects the areas available for real-time movement.On the basis of the segmentation model's output, the authors can determine the areas that will serve as input for the autonomous mobile robot navigation system.

A. Multi-class Semantic Segmentation based on MobilenetV2 Network
The preceding architectural design has inspired numerous variations.Using models such as LeNet and AlexNet [36], convolutional neutral networks (CNNs) have been shown to produce the most advanced results for image classification problems.Successive enhancements, such as VGGNet, GoogleNet, and ResNet [36,37], have increased efficiency and efficacy.Eventually, the convolutional neural networks were developed into a full convolutional network.The authors create a network based on a full convolutional neural network (FCN) [38][39][40][41][42][43] to accomplish instantaneous pixelby-pixel labeling while maintaining reliable segmentation results.VGG is preferred to AlexNet because the former model is more well-known but makes less accurate predictions.Semantic segmentation is a computer vision and image processing algorithm whose objective is to classify and segment each pixel in an image into classes corresponding to distinct semantic content.The authors implement a segmentation network for instant pixel-specific labeling that is primarily based on the FCN model's concept.The FCN network model for semantic segmentation is constructed as depicted in Fig. 1.

B. MobilenetV2
MobileNetV2's input channels are extended using 1 x 1 point convolution, as shown in Fig. 2.Then, use depth convolution for input linear feature extraction and linear convolutional integration to combine output features while shrinking the size of the network.It replaces Relu6 with a linear function after size reduction so that the output channel size matches the input.
In Fig. 3, In addition to Depthwise Separable Convolutions, Linear bottlenecks and Inverted Residual Block (shortcut links between bottlenecks) are suggested for usage in MobileNetV2 [34].Since the input and output of a block in a conventional residual architecture typically have more channels than the intermediary layers, MobileNet v2's residual block is the inverse of this design.To reduce the number of model parameters, the authors employ a depth-separated convolution transform and an inverted residual block between the layers.The method allows for the MobileNet model to be simplified while maintaining its functionality.Finally, network parameters are optimized using Adam's optimum function [47][48][49][50][51] with a learning rate of 0.001 and 100 epochs in order to maximize the balanced cross-entropy [52][53][54] described by Equation ( 1).Gaussian blur [55,56] and Gaussian noise [57,58] are applied to the data set as preprocessing to assure the quality of the raw images before they are fed into the suggested segmentation model (see Fig. 4).Using the aforementioned approaches to generate more generic datasets improves the quality of the segmentation model, but at the expense of image quality.

B. Mobile Robot
The authors assess the transformation's efficacy and verify semantic segmentation's important function in building the frontal perspective of the floor.Then, the mobile robot's ideal path planning can be created.The experimental results support the effectiveness of the collision-free zone detection method.When the forward perspective is standardized, an allencompassing plan for navigation and obstacle avoidance can be developed.Experiments depicted in Fig. 6 were done to evaluate our suggested sematic segmentation for use with four-wheel mobile robot navigation.

A. Practical Face Recognition
The dataset used for training the semantic segmentation model is 1200 images from the library of Ducktown data set simulation software [59,60].In the Ducktown data set simulation software, there are full objects such as houses, cars, people, objects and roads, etc.Some images represent the environment moving around the robot as shown in Fig. 7. From the successful training of the semantic segmentation model, the robot vision system will generate the output after segmenting the acquired image to create two distinct regions, the area that allows movement (yellow) and the obstacle area (purple) as shown in Fig. 9.
Applying the proposed method of scaling from the size of the images of the environment to the actual size of the moving robot environment, the authors can apply the control model.Integrated PID control with tracing control moves from the navigation plan generated from the image processing results area of the semantic segmentation model.
The steering control plan for the control the mobile robot to drive two wheels at the same time, the remaining two wheels are responsible for balancing the mobile robot as shown in Fig. 10.With (): linear velocity of the right wheel; (): linear velocity of left wheel; (): angular velocity of the right wheel; 1(): angular velocity of right wheel; : radius of each wheel; : distance between two rudders; : instantaneous radius of curvature of the robot trajectory, relative to the mobile robot's center axis; : instantaneous center of the orbital curve;  − /2: radius of curvature of the orbit described by the left wheel;  + /2: radius of curvature of the orbit described by the right wheel.
Fig. 11 shows the PID closed-loop control structure for a self-propelled robot.When input  is specified, the PID controller will give a control signal to ensure that the self-propelled robot can follow the path in the navigation strategy.The control kinematics then calculate the velocity values for the wheels.In the next stage, the deflection angle, , will be feedback and compared with the set value, the deviation will be adjusted by the PID controller to ensure zero approach when the time t goes to infinity.

B. Control Management Procedure
Fig. 12 depicts the results of the comparison between the PID controller and P with the motion of the vehicle.The selfpropelled vehicle starts from the origin xOy to ensure that the motion follows the moving object.In Fig. 12, the changing of deflection angle corresponding to the PID controller has been really smaller than the P controller.With less than 0.12 rad, mobile robot has completely and smoothly tracked the path.

Fig. 4 .
Fig. 4. The suitable steps from (a) to (c) of pre-processed real images

Fig. 11 .
Fig. 11.PID control diagram for mobile robot tracking the path

Fig. 12 .
Fig. 12.The comparison between PID controller and P controller about the steering angle of mobile robot V. CONCLUSIONS This paper proposes an ultra-fast semantic segmentation model to reduce the number of training parameters and the cost of computation.In addition, the mean Intersection over Union (mIoU) is 89%, and the high accuracy is 95%.Based on segmentation results, the mobile robot's path planning is constructed successfully.The proposed models outperform state-of-the-art methods, which necessitate larger datasets for training, while using fewer resources in the training model.