Online Digital Image Stabilization for an Unmanned Aerial Vehicle (UAV)

The Unmanned Aerial Vehicle (UAV) video system uses a portable camera mounted on the robot to monitor scene activities. In general, UAVs have very little stabilization equipment, so getting good and stable images of UAVs in realtime is still a challenge. This paper presents a novel framework for digital image stabilization for online and real-time applications using a UAV. This idea aims to solve the problem of unwanted vibration and motion when recording video using a UAV. The proposed method is based on dense optical flow to select features representing the displacement of two consecutive frames. K-means clustering is used to find the cluster of the motion vector field that has the largest members. The centroid of the largest cluster was chosen to estimate the rigid transform motion that handles rotation and translation. Then, the trajectory is compensated using the Kalman filter. The experimental results show that the proposed method is suitable for online video stabilization and achieves an average computation time performance of 47.5 frames per second (fps). Keywords—K-means, Kalman filter, image stabilization, optical flow, UAV.


INTRODUCTION
Recently, mobile robots are increasingly needed to help in everyday human life such as humanitarian assistance [1], services industry [2], environmental monitoring [3], and security [4]. The Unmanned Aerial Vehicle (UAV) often use for surveillance [5], target tracking [6], navigation [7], and localization [8] due to its flexibility of moving in any area.
The aerial image sequence is obtained from a portable camera set on an Unmanned Aerial Vehicle (UAV) to monitor activity [5]. Since, in general, UAVs have very little stabilization equipment, they can be seen as unstable platforms with different intense, unexpected vibrations. As such, images taken using a UAV are of a lower quality than those captured from a stationary camera. Thus, target or landmark recognition often fails due to loss of focus and blurriness caused by the vibrations in the image. Besides, the challenge of this work is to find an effective solution to stabilize the sequence of images for online applications Several approaches have been proposed to solve the problem of image stabilization from video. There are two types of image stabilization techniques for UAVs: physical stabilizer and digital stabilizer. Physical stabilizer, which include Optical Image Stabilization (OIS) [9]- [10] and Electric Image Stabilization (EIS) [11]- [12], are equipped with several sensors (e.g., gyro sensor, hall sensor, and CCD image sensor) and actuators (e.g., stepping motor, piezoactuator, and servo motor) so they cost more than digital stabilizer and also add to the weight of the UAV. The OIS method manipulates lens movement using motion sensors [13] and active optical system [14] that causes several typical magnet circuit phenomena such as saturation, hysteresis, and lack of linearity. These phenomena make it difficult to control the system. The EIS method uses motion sensors and mechanical devices to compensate for camera movement so that image blurriness and vibration can be reduced. But, the stabilization rate becomes poor at high vibration frequency [15] and needs to be improved by digital image stabilization (DIS).
A DIS method generally consists of camera motion estimation and compensation that have feedforward control and system compactness [16]- [17]. Conventional techniques for DIS usually use feature points [18] such as sparse optical flow [19]- [20] and select some useful point features using Shi-Tomasi corner detection [21]. This method selects multiple features in the previous frame and tracks similar features in the current frame, then calculates the optical flow in the region of interest (ROI). Motion estimates can be computed with the Kalman filter [22], but this involves only a few points that may not represent the global movement of two consecutive images. Another challenge in DIS is online computation, subject to a constraint on the number of frames per second (fps). In the online application, the accumulative global motion (AGM) [23]- [24] may tend to fail when videos are running with long online duration due to accumulated motion estimation error.
Several DIS approaches have been proposed to stabilize the sequence of images taken by UAVs [20], [25] that processes the entire series of images before stabilizing each image. As such, these methods cannot be used for online applications that require direct processing. The methods in [18], [22], [26], and [27] performs image stabilization online and in real-time but not for the camera on the UAV, so these methods are not sufficient for handling images on a UAV which has more unexpected motion and vibration.
To solve the problem of online and real-time image stabilization in a UAV, a new framework is proposed in this paper. The digital system uses dense optical flow to determine the motion vector field between two consecutive frames then uses K-means clustering to classify the motion estimation. A Kalman filter compensates for the trajectory.
The remainder of the paper is organized as follows. Section II introduces the overview system of the proposed method. Section III presents the results and discussion. Finally, conclusions are drawn in Section IV.

A. System Overview
The framework of the proposed aerial image stabilization (AIS) is shown in Fig. 1. In the first step, the proposed method extracts the motion vectors field from two successive consecutive frames, which are used as a feature to estimate the motion model parameters. The proposed AIS adopts the K-means algorithm [28] as unsupervised learning to find global motion on the frame based on the centroid of the cluster with the largest member. An optimal cluster number was found based on the gap statistics to avoid outliers in the clustering [29]. Then, the motion estimation is determined from the selected cluster centroid.  Fig. 2(a) illustrates the distribution of the motion vectors field in the image. The red color represents a substantial region that does not contain any objects and has a zero motion vector value. The blue color represents dynamic objects or pixels that have a higher displacement value than other regions. The green color represents static objects or pixels that have a lower displacement value than the blue region. In this case, the green color represents camera movement, because the displacement of the static object is caused by the movement of the camera. Fig. 2(b) illustrates the clustering of motion vectors that have a value higher than zero. The cluster with the largest member (green pixels) is selected, and its centroid value is used to estimate the image motion. Fig. 2(c) shows the smoothed trajectory obtained from the motion compensation of each frame.
The proposed motion compensation also aims to avoid large gaps in motion estimation caused by unwanted UAV movements. If the estimated motion of the previous frame and the current frame are too different, the image transformation will shift too far from the original image plane.

B. Motion Vectors Distribution
Farneback optical flow [30] is used to estimate the field of motion vectors of the dense optical flow in each frame. This optical flow method is based on polynomial expansions to compute each grid point in the image as a local neighborhood. Polynomial expansion transforms based on translation and rotation from two consecutive frames derived using coefficients. The point of vector displacement is estimated in a two-dimensional function.
For every two consecutive frames, the previous and current frames are defined as ( 1) ft − and () ft , respectively, at time t. Then, the image size is reduced by 50% of the original size and the color is changed into a greyscale. Let ( 1) ft  − and () ft  denotes the new image of ( 1) ft − and () ft , respectively.
Given the pixel position J with the pixel's neighbourhood M(J), the local coordinate system at ( 1) ft  − can be approximated by ( 1) ct− is a scalar. Assuming the image brightness is constant, the local coordinate system at () ft  can be obtained by A new signal is constructed by where d is a global displacement, and by equating the coefficients in (2) and (3), the vector displacement can be calculated by The initial estimation of the displacement vector is set as (0,0). Thus, the ROI of the two frames has the same path.

C. Motion Vectors Clustering
The motion vector matrix is calculated for every two consecutive frames of the image sequences in the next step.
The initial number of clusters is estimated by 2 Nn = where n represents the total number of the displacement matrices in (5). In this step, the K-means algorithm ends with several empty clusters results. So that the size of the cluster validity based on statistical hypothesis testing is used to obtain the optimal number of N clusters.
The gap statistics compare the similarity in clusters for a dataset with the expected cluster similarities of the data points resulting from the null-model reference distribution at each N value. The sum of the distance of all points from the cluster r (Cr) taken in pairs for i and i', return the number of points in the cluster to the centroid distance by where x is the data of Cr, i = 1, 2, …, n is the points of every cluster, and j = 1, 2, …, m is the total cluster in the iteration.
The size of the dispersion as the sum of squares in the cluster collected can be computed by 1 Then, the gap statistics can be computed as a log(WN) ratio to the expectation as where En is the expectation of a new reference dataset with a uniformly generated maximum entropy of the same boundary and size n from the original dataset. The optimal number of clusters is selected from the N at the highest associated gap as After obtaining the optimal number of clusters in (9), each cluster member is searched using the K-means algorithm. The centroid of the cluster with the largest member is selected, as shown in Algorithm 1.

D. Motion Compensation
Unwanted UAV movement causes the motion estimation to fail because the scene of images of ( 1) ft  − and () ft  are very different and makes a massive displacement of the trajectory. Therefore, to avoid a rough transformation image, a Kalman filter [5] is used to smooth the trajectory and generate a new transformation set for each frame.
The Kalman filter consists of two steps: prediction and update. The prediction step estimates the trajectory at   (16) where the trajectory differential variables are calculated by The scale factor can be computed by Finally, a new image () ft is obtained by transforming () ft using the new trajectory in (16) III.

RESULTS AND DISCUSSION
The performance of the proposed method was evaluated using three types of aerial video sequences obtained from the UCF aerial action dataset (https://www.crcv.ucf.edu) and an online video from the UAV. The "Park" video has the fewest objects than others, and the displacement is smaller than the "Street" video. The "Street" video has the most massive displacement and larger objects than other videos. The "Car" video had the smallest displacement but contained the most vibration compared to other videos. For online shooting, we use a UAV as shown in Fig. 3 and a wireless camera with a resolution of 640×480. The UAV sends images to the ground station (PC), and the system runs a stabilization algorithm.
Experiments were carried out using a 2.30 GHz CPU with 8 GB RAM. The experimental results are summarized in Table I concerning computation time performance. The proposed method achieves an average computation time of 47.5 fps.   Table II summarizes the comparison based on the average time with the existing methods in [18], [22], [26], and [27] methods were chosen because they perform frame-by-frame stabilization of digital images for real-time applications and give attention to the computation time for processing.

IV. CONCLUSIONS
A new method for stabilizing the sequence of images captured by a UAV has been presented in this paper. The main contribution of the proposed method is to perform online image stabilization on a UAV with fast computation time and find accurate estimation of image motion based on the most frequent motion vector value of two consecutive images. The precise motion estimation of this method can handle the rough transformations of images caused by UAV motion and vibration. Comparing the results on various image sequences, the proposed method could be a potential online application for image stabilization using UAV. We believe that a vision-based system using a UAV can have better results with online image stabilization due to better image quality.