Multi Cost Function Fuzzy Stereo Matching Algorithm for Object Detection and Robot Motion Control

— Stereo matching algorithms work with multiple images of a scene, taken from two viewpoints, to generate depth information. Authors usually use a single matching function to generate similarity between corresponding regions in the images. In the present research, the authors have considered a combination of multiple data costs for disparity generation. Disparity maps generated from stereo images tend to have noisy sections. The presented research work is related to a methodology to refine such disparity maps such that they can be further processed to detect obstacle regions. A novel entropy based selective refinement (ESR) technique is proposed to refine the initial disparity map. The information from both the left disparity and right disparity maps are used for this refinement technique. For every disparity map, block wise entropy is calculated. The average entropy values of the corresponding positions in the disparity maps are compared. If the variation between these entropy values exceeds a threshold, then the corresponding disparity value is replaced with the mean disparity of the block with lower entropy. The results of this refinement are compared with similar methods and was observed to be better. Furthermore, in this research work, the v-disparity values are used to highlight the road surface in the disparity map. The regions belonging to the sky are removed through HSV based segmentation. The remaining regions which are our ROIs, are refined through a u-disparity area-based technique. Based on this, the closest obstacles are detected through the use of k-means segmentation. The segmented regions are further refined through a u-disparity image information-based technique and used as masks to highlight obstacle regions in the disparity maps. This information is used in conjunction with a kalman filter based path planning algorithm to guide a mobile robot from a source location to a destination location while also avoiding any obstacle detected in its path. A stereo camera setup was built and the performance of the algorithm on local real-life images, captured through the cameras, was observed. The evaluation of the proposed methodologies was carried out using real life out door images obtained from KITTI dataset and images with radiometric variations from Middlebury stereo dataset.


INTRODUCTION
Computer vision applications have an immense use for stereo matching in situations which require 3-D reconstruction, view synthesis and autonomous navigation.For obtaining this 3-D information, the similarities between the image pairs are considered [1].
The main goal of stereo matching algorithms is the extraction of 3-D information from a scene by considering the similarities at different viewing positions, between the given image pairs acquired.Categorizing them based on the density of the results, these algorithms can be mainly split into dense stereo matching and sparse stereo matching.The aim of dense stereo matching algorithms is to determine the correspondences of every pixel in images, whereas sparse stereo matching algorithms mainly consider only the features within the scene and not all the pixels.This topic was exhaustively reviewed in [2,3].Apart from the abovementioned classification, stereo algorithms can also be categorized based on the complexity of calculation into local and global methods.Small patches of pixels or windows of the reference image are used in local stereo matching approaches.The disparity of the pixels is determined by comparing this information in the window region with the patches in the target image.Global approaches, obtain results through a more complicated manner in which all the disparities are concurrently calculated by applying energy minimization methods.Generally, local stereo algorithms are faster than their global counterparts and exhibit a smaller memory requirement.Unfortunately, the results obtained through these methods are usually inclined to generate results with a lower accuracy when compared to their global counterparts.Numerous local algorithms based on adaptive weight [4,5] can achieve results similar to those obtained using global stereo matching methods.One of the major drawbacks of these local algorithms is the high computational complexity that varies quadratically when compared to the window size used to aggregate the matching costs.
Obtaining dense results (disparity maps) from stereo matching algorithms would usually include the following four steps: (1) matching cost computation (2) cost aggregation (3) Journal of Robotics and Control (JRC) ISSN: 2715-5072 357 Akhil Appu Shetty, Multi Cost Function Fuzzy Stereo Matching Algorithm for Object Detection and Robot Motion Control disparity cost computation/optimization and (4) disparity refinement.As has been previously mentioned, these algorithms can be categorized in to two broad clusters namely local stereo matching algorithms and global stereo matching algorithms [6].The aggregation of matching costs for local algorithms would be decided based on factors such as colour values or their image pixel intensities.During the selection of these support windows, it is implicitly assumed that values within this region would usually be smooth.The decision of the best disparity is taken through a winner takes all (WTA) approach.Unlike the local approaches, the global techniques assume explicit smoothness.These algorithms use complex optimization functions that decide the disparity of the present pixel based on the calculated disparities of other pixels [7].
Local algorithms generally compute disparities faster but have a tendency of generating less reliable matches in texture less areas and also where there are sudden discontinuities in depth.
Comparatively, global algorithms generate a much more accurate result but, due to the computation complexity, are more expensive.The methodology presented in this work uses local algorithms.
The generation of the disparity maps from stereo image pairs would present us with viable results.But even though the results seem to be accurate, it would always be advisable to perform a post processing technique to further enhance the obtained results.A majority of the authors have proposed a pixel based refining technique to further reduce the errors of their methods.But in our case, since we are working with a super pixel-based aggregation technique for disparity calculation, the disparity values of a patch will be similar [8][9][10].In such a situation, where disparity maps are generated in terms of patches rather that pixel-wise, it would be inappropriate to go ahead with a pixel-based approach as majority of the refinement techniques do.We would need to perform operations on the disparity maps in terms of patches as well.Rather than taking only a single cost function to evaluate the level of similarity, the authors have used a combination of normalized mutual information (NMI), normalized cross-correlation (NCC), and absolute differences of gradients on the image pairs.Along with this, the disparity selection is made through a fuzzy logic-based approach.The main contribution of this research work appears in the form of a refinement technique for the fuzzy-generated disparity map.An entropy-based refinement approach is utilized for this.The obtained disparity map had been further refined through a vdisparity and u-disparity based technique for detection of objects present in the reference images.These objects are further categorized into three groups based on its distance from the camera.This information is then provided to a path planning algorithm and implemented on a robot.The authors adopted feature-based matching algorithm over deep learning methodologies to overcome higher processing power [11][12][13].In application which work on limited resources, this would prove to be quite a limitation.
In the rest of the paper, Section II discusses the literature survey of similar work, Section III presents the methodology followed and the algorithms developed to carry out this research work.The results and discussions are also presented in the respective sections.The conclusion is stated in Section IV.

II. RELATED WORK
Stereo vision is considered as a complex and complete perception due to its inherent nature to calculate distance (depth) information of the objects in an image as compared to monocular vision.Multiple experimentations on autonomous ground vehicles with this methodology have presented notable results.Most of the algorithms require a few assumptions related to the approximated free space on the ground to achieve considerable performance when it comes to longrange and complicated road conditions.
The disparity maps generated from stereo images would present feasible results to advance to the next subsequent stages of the system.While the results seem to be sufficient to proceed, it is always advisable to include a post processing stage to further enhance the obtained results.A majority of the authors have proposed a pixel-by-pixel based refining technique to further reduce the errors of their methods.But in the presented research, a super pixel-based [14][15] aggregation technique for disparity calculation is considered.Hence, the disparity values within a patch will be similar.In such a situation, where disparity maps are generated in terms of patches rather that pixel wise, it would be unfitting to go ahead with a pixel-based approach as majority of the refinement techniques do.In such situations, it becomes necessary to perform operations on the disparity maps in terms of patches too.Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) stereo dataset was used for the development of the refinement technique as it presented a variety of real-life outdoor situations which are more suited for this application.With the disparity maps now being refined and enhanced, the next step would be the isolation of the obstacles.For this it becomes mandatory to segment and highlight the regions of interest in the images.This has been accomplished by extracting certain information from the disparity maps themselves.This is an extremely critical and fundamental task in autonomous vehicles as they are able to perceive the environment through it.
In a dynamic environment, detection of obstacles is a major task of computer vision systems.The sensor measurements of the objective world are generated by the environmental measurement model.Due to reflections of light from multiple sources, noise affects the visual sensors.Hence, obstacle detection and its isolation in close proximity of the robot is given paramount importance [16].In [17] a technique for the estimation depth data from a rectified image pair is presented.A convolution neural network (CNN) trained on small patches of input images was able to learn similarity between them.This result was used for the initialization of the cost function.The datasets from both KITTI as well as Middlebury were used for evaluation.During the training the ground truth disparity values of the image patches were made available to the network.
An algorithm based on random walk with restart for a dense stereo disparity map generation is proposed in [18].Pixels were grouped into super pixels, which were then used for matching cost aggregation by combining image gradient matching and census transform.As compared to other stereo algorithms, such as belief propagation, graph cut or semi global matching, the algorithm estimates the best possible ISSN: 2715-5072 358 Akhil Appu Shetty, Multi Cost Function Fuzzy Stereo Matching Algorithm for Object Detection and Robot Motion Control disparity value for each pixel matching in the stage where updating the cost takes place [19].
A modified version of the Census Transform (CT) is presented in [20] which the authors claim to be comparatively more robust in situations pertaining to radiometric variations in real life outdoor road scenarios.The modification was made to the cost aggregation method in that it also considers the edge information present in the image pairs.They also observed that their proposed cost function generated better results as compared to the orthodox cost functions when they applied it on the KITTI stereo benchmark [21].The authors in [22] propose an algorithm that uses the cues related to the scene data which include: identification of arrangements of surface normals and edges, detection of connected and coplanar section in images such as walls.A seed growing algorithms were also considered for determining the scene flow in a stereo setup [23].The parameters used in the algorithm include a growing threshold, the temporal consistency enforcement parameter and an optical flow parameter.The correlations are based on the MNCC based statistic with the window size being 5x5 pixels.
In [24], a comparatively simple and fast algorithm was exhibited which is claimed to improve the number of valid disparity values through the use of local methods all the while reducing the number of pixels which have erroneous values.Through this technique, the initial disparity map was generated from a local stereo matching method which was repeatedly improved through depth interpolation method and image warping depending on the estimated depth.From the KITTI stereo data set, it had been observed that, on an average there was a rise in depth density after a single epoch, for a minor increase in computation.This technique also initially uses a small number of correspondence seeds.The proposed algorithm is capable of handling complicated issues, such as repetitive patterns and complex scenes, in a far better manner than that of the algorithms in the same category.Moreover, the proposed algorithm is also claimed to recover from erroneous initial seeds [25].A novel approach for stereo matching problem which involves the use of a binary cost estimation and aggregation method was proposed [26].The construction of the cost volume is based on bitwise operations on a set of binary strings.The results demonstrate that the algorithm is robust when compared to previous methods under these radiometric variations.
An approach for on-road distance estimation and vehicle detection, given disparity maps generated from a stereo matching algorithm is proposed in [27].They have utilized available stereo matching method to estimate a dense disparity map.The obstacles space and free space are the two different spaces segmented by the dense disparity map.The KITTI datasets are used for the evaluation of the proposed algorithms.
In [28], vehicle detection in images from a classification system is presented.This system comprises of a machine learning classifier and an expert system preselecting relevant sections in an image by making use of the designed features.The authors validated the classification algorithms on real traffic videos and used it for vehicle detection.A real-time invehicle detection method that presents the idea that of vehicles existing in rectangular sub-areas by a combination of multiple morphological vehicle features based on the classification of the detected features is presented in [29].Authors in [30] presented that the observation of driving location is an essential and critical job of self-driving and Advanced Driver Assistance System (ADAS) technology.The proposed method is compared to basic methods that use KITTI stereo datasets.The observation from the results have revealed that the algorithms are robust to the measurement errors in the sequences of a long video.The authors used a stereo-vision based obstacle detection technique for intelligent ground vehicles and estimated the motion of obstacles in consecutive frames.
A methodology for vision based local navigation inspired by a model representing human navigation is proposed in [31], where the heading of the robot and potential field is derived by calculating its distance from the goal and the angular width of the obstacles.This field controls the steering of the robot and its angular acceleration.This method of directly controlling the steering is preferred for local navigation.A ceiling-mounted camera wheeled robot using image-based visual servoing algorithm was described in [32].Visual servoing uses visual information to control the movement of the robot.Image-based visual servoing and position-based visual servoing are the main classifications in this method.The image error is used in image-based servoing method to control the robot.Simultaneous localization and mapping (SLAM) construct an environment map by parallel tracking the location of the robot.This method finds application in navigation, odometry and robotic mapping for augmented environments [33,34].Some of prominent methods like Extended Kalman Filter (EKF), particle filter, Graph-SLAM, and covariance intersection can be used to achieve this approach.EKF-SLAM uses feature extraction algorithms to extract features such as lines and corners to build the map of environment, which can be also used to identify the robot location [35].The extracted features and the pose of the robot are positioned at a global coordinate system.A secondary map which contains the location of start and destination points of each segment is built using the extracted features.
The most basic issue with robot motion planning in a dynamic and complex environment is to find a path from the source position to the destination without colliding with obstacles in an optimal manner.A comparative compilation of various such approaches related to motion calculation in dynamic environments the simulation of few of the models is presented in [36 -39].As opposed to most of the existing monocular systems, our approach concentrates of a stereo camera setup to overcome the issues of shadow interference, illumination variation and object occlusion.A sparse set of features are selected for every frame, which are then projected into the 2D ground plane.A kernel-based approach for clustering is adopted for this research [40][41][42][43][44].
In order for the robot to reach its destination without collision, a recurrent fuzzy neural network (RFNN) approach is considered.RFNN combines the learning ability of the neural network and the inference capability of a fuzzy logic system.The side feedback loops in an RFNN helps improve the robustness and stability of the system.The weights of the network are trained with the help of an extended Kalman filter [45][46][47].By enforcing spatial dependencies among disparity estimates and scene characteristics, the authors were able to improve estimated disparity values.To attest the accuracy and viability of the approach, the results were evaluated through the Middlebury benchmark stereo datasets and the evaluation dataset version 3.0, where the presented approach was compared with the state of the art.It was observed that the proposed algorithm generated disparity values with higher accuracy when compared with non-learning and learning based algorithms [48][49][50].

III. PROPOSED ALGORITHMS AND RESULTS
The block diagram of the proposed methodology is presented in Fig. 1.As a first step, the left and right images are subjected to segmentation.Each segment obtained as a result, would contain pixels of similar intensities.Following this, similarity matching functions 'Data cost 1' and 'Data Cost 2' are used to match similar regions in the two images.The results generated through these similarity matching metrics are presented to the fuzzy disparity generator which provides the appropriate disparity values based on the results of the similarity metrics (results of 'Data cost 1' and 'Data cost 2').This generates two disparity maps corresponding to the left and right images.Post this step, the proposed entropy-based refinement technique is applied to the generated disparity maps and a single refined disparity map is generated.Further, the obstacles and road segments are detected through u and v disparity methods.It has been observed from the literature review section that using multiple data costs provides a good solution to the problem at hand.Hence, we adopted a similar approach in our work and used the cost functions advocated in the literature, namely Normalized Cross Correlation (NCC) and Normalized Mutual Information (NMI), which are denoted by (1) to (4). where, Given a pair of stereo images, many traditional stereo matching methods use a single data cost metric to determine the disparity map such as absolute intensity difference and squared intensity differences, Sum of Absolute Differences (SAD), sum of squared differences and NCC.In real world scenarios, a small variation in radiometric conditions is observed between the two stereo images under consideration.
In such conditions, the performance of the traditional stereo matching algorithms tends to deteriorate.Over the years, it has been observed from literature [51][52][53][54][55] that using multiple data costs provides a better solution to the problem of computing accurate disparities.Inspired by these findings, the authors propose a stereo matching metric comprising of multiple data costs to overcome the abovementioned problem.The research work in the field of stereo matching in [56] compares NMI and NCC metrics, and claims that MI is suitable for this field.Both MI and NCC are capable of handling radiometric differences, but their performances deteriorates at a cost of window size.SAD is another prominent metric that can be used in stereo matching, but its results in the radiometric variations shows large error.Even though large error is generated in SAD, it is capable to generate better disparity maps while used on the gradient of images.In this work, NMI, NCC, and SAD gradient of images, as given by equation ( 5) to (7), and a novel metric comprising of linear combination of mentioned three data costs is considered.
where   and   are the left image and right image sections, respectively.
In Cost aggregation, it has been observed that it is not a good choice to use a fixed window size, as they include regions with depth discontinuities.The presence of uneven depths within the window generates erroneous disparities.An alternate method would be to use a window whose size is not fixed.Such a window should be able to change its shape depending on the properties of the pixels under observation.One such method is based on segmentation.Here, the reference image is initially segmented and matched with the target image.
In this work, the authors have segmented the input images into patches [57] consisting of pixels belonging to the same object (region).To overcome the problem of windows having pixels of different depths, these segmented regions are considered as windows and stereo matching is performed using these newly obtained variable sized windows.Fig. 2 shows the segmentation result for one of the images in the dataset.As can be observed, pixels in each segment belong to the same region and would hence have similar depth.The disparity location that has the optimal cost is found after calculating the cost values for each disparity.WTA scheme is used to select the disparity.From all possible disparity locations, the one with the best cost is selected as the true disparity value.To overcome the possibility of ambiguity, an "intelligence" based disparity selection method is used wherein, instead of directly applying WTA on the results of the datacosts, they are fed into the fuzzy logic module to generate a value which is obtained through the combination of the two mentioned datacosts and apply the WTA on the result of this fuzzy logic module.Fig. 3 represents the work conducted in this paper using a Fuzzy Interference System (FIS).A rule base "if-then" approach is formulated to decide the output [58].The two inputs to the FIS are the Cost function 1 and 2, defined by equation ( 8) and ( 9), respectively.The data cost metrics use for this research are NCC, NMI, and SAD of gradients as described in [59].
For deciding the best combination of the datacosts, the positions of the considered three data costs were interchanged in equation ( 1) and ( 2), the results were observed.Equations ( 10), (11), and ( 12), defines the membership functions for Cost function 1, Cost function 2, and the Output, respectively.The rules of the FIS are mentioned in Table I.The membership functions of input 1 and input 2 are shown in Fig. 4 and Fig. 5 respectively while the membership function of output is shown in Fig. 6.
Both left and right disparity maps are subjected to the fuzzy-based approach as shown in Fig. 3.These resulting left and right disparity maps are used for left-right consistency checking to obtain the final refined disparity map.For all considered disparities, the data cost for every segment in the reference image are calculated using the cost functions and are fed into the FIS.This process is iterated for both the left and right images to obtain the disparity maps corresponding to the left and right images, respectively.Results are further improved by subjecting these images to left-right consistency check [60].A point is said to be consistent, if the condition in equation ( 13) is met.The selection of the disparity value is almost always done through a WTA-based method [61].While calculating the cost, there is chance of multiple maximums/minimums to be produced by a cost function, which leads to ambiguity.A WTA approach in the presence of such ambiguities, might lead to erroneous results.An "intelligence" based disparity selection approach is proposed to tackle this problem.
To reduce the computation time, it is assumed that pixels which are close together and similar belong to the same object.These patches of similar and close pixels are considered as a patch of super pixels.Hence these similar pixels would have the same disparity.

A. Entropy based Selective Refinement of Disparity Maps
To reduce the computation time, it is assumed that pixels which are close together and similar belong to the same object.These patches of similar and close pixels are considered as a patch of super pixels.Hence these similar pixels would have the same disparity.This eliminates the need to calculate the disparity of every single pixel in the stereo pair.Instead, calculate the disparity of the whole patch altogether and assign the same disparity to all the pixels that belong to it.Though this drastically reduces the computation time, values that have been calculated erroneously would be seen as patches in the disparity map as well.The tradition methods of refinement [62] can be used only if the disparity map was generated using pixel wise computation as the errors would also be present in the form of uneven pixels.But in the proposed methodology, whole patches of erroneous values can be present wherein the traditional methods of refining single pixel values would not be suitable.For this purpose, a new entropy based selective refinement methodology (ESR) for refining the generated disparity map has been proposed.The block wise representation of the proposed methodology for the refinement of the disparity maps is shown in Fig. 7.
An entropy-based disparity refinement technique is proposed to improve the quality of the generated disparity maps.The entropy of the block is computed through the Shannon's entropy [63], which is represented in equation ( 14). = − ∑ () log ()  =0 (14) The disparity maps are obtained through the methodology mentioned for generation of disparity maps.A window around each pixel is considered and the entropy of this block is calculated.Similar procedure is carried out for the corresponding position in the other disparity map.If the variation in the entropies between the two segments being considered exceeds a set threshold, then the center pixel of the block of left disparity map is replaced.For the replacement value, find out which block has lower entropy (i.e., left block or right block).The mean of the disparity block with lower entropy is selected as the replacement.The replacement criteria can be represented as given in (15).
where   ̅̅̅̅ and   ̅ are the mean and the mean entropy of the disparities of the left disparity map block while   ̅̅̅̅ and   ̅̅̅ are the mean and the mean entropy of the disparities of the right disparity map block.  is the entropy difference threshold value which was selected through experiments.The stereo matching algorithm would generate two disparity maps (one with reference as left image and another with reference as right image as reference) as shown in Fig. 8.  II.Fig. 9 and Fig. 10 present the entropy images of the left disparity map and the right disparity map obtained from the proposed disparity generation method.Fig. 11 depicts the effect of the size of the window and threshold value on the average error and the percentage of bad pixels of disparity maps.

B. Segmentation of Obstacle Regions and Path Planning
In [25] the authors proposed a strategy to detect obstacles from planetary images in unfavorable imaging conditions.They had used a combination of 2D grayscale information along with 3D point statistical information from an image pair to extract a region-based map of obstacles.They used mean shift segmentation and seed points extracted from reprojection of the 3D points in the left image [74].Proposes a technique for stereo vision-based obstacle detection that searches the depth map along the vertical direction for pixels possessing the same disparity.They also used a monocular camera model along with stereo vision to detect all the possible obstacles present.The authors claim that their method performs obstacle detection reliably under various traffic conditions.
The methodology used in this research work for segmentation of the obstacle regions in the obtained disparity maps is presented in Fig. 12.A sample left and right disparity maps (Fig. 13(a) and Fig. 13(b)) along with the refined disparity map obtained from the previous stage (Fig. 13(c)) is presented in Fig. 13.
For the selection of obstacle regions, first detect the locations that belong to the road and sky regions.For this purpose, calculate the histogram of every row in the image (also known as v-disparity map) as mentioned in ( 16) and ( 17).In the next step, remove the sky regions in the modified disparity maps.The original RGB image was converted to HSV and the appropriate 'hue' value was used to eliminate the sky regions as presented in Fig. 17  The equations related to conversion from RGB to HSV [27] are as mentioned in (18).Let R, G, and B indicate red, green and blue bands of the images.We would first need to normalize these values as described in (18).The hue, H, can be calculated as described by (19).
The saturation, S, can be calculated as described by (20).
The value V, is given by ( 21), As previously indicated, the correspondence operation on a stereo image pair results an image called as disparity map.In this image, the value of the disparity of a pixel is indicated by the brightness of each pixel.The depth of the object from the camera is inversely proportional to the disparity value, corresponding to the pixel.Since the obstacles which are within the close proximity are excluded from the classification process, it is difficult to keep an exact limit on the disparity for detecting obstacles.The disparities of the obstacles vary with their distances.Therefore, the grouping of obstacles based on their disparity values into different clusters is of vital importance.The categorization of image pixels into different clusters by dividing the data into set of groups is known as Image segmentation [75].k-means clustering is one such popular unsupervised learning method to achieve this process, where the data collected are partitioned into k groups of data, i.e. the data is segregated into k-disjoint clusters.This objective is achieved in two separate phases by k-means algorithm.In the first stage, the centroids of each of the clusters are calculated.This is followed by assigning each point to the cluster it is closest to [76].The process is repeated for all the data points with all the k centroid points.Euclidean distance is the most popular approach, by considering distance metric, in the calculation of the distance of the data point to the nearest centroid.This is represented by (22).
where p(x, y) is the pixel that needs to be grouped into one of the clusters, ck is the center of the cluster and d is the calculated distance.
After the points are initially assigned to the cluster, the second phase operation of recalculating the new centroid for each cluster is performed.The new centroid is calculated as described by (23).
∈  ∈  (23) where k denotes the clusters.These steps are iterated until the tolerance or error value is reached.
The Euclidean distance between each point and every centre is recalculated depending on the new centroid, which will further adjust the data point's assignment into the clusters and refine the clusters iteratively.The parameters that define each of the clusters are its data points and the centroid.A new position is assigned as the centroid that has the least sum of the distance to every point in the cluster.Thus, the k-means algorithm, in an iterative manner minimizes the sum of distances of each point with its centroid over all clusters [77].After this process is completed, each cluster will be consisting of similar disparity value pixels.
After the completion of the clustering stage, every cluster would contain pixels with similar disparity values.This would result in the disparity map getting segmented into separate regions.The next step is to group similar disparities together depending on the intensity values of the refined and modified disparity map.For this purpose, the k-means segmentation algorithm has been used and studied the accuracy of detection for varying sizes of clusters.A sample of one of the segments is presented in Fig. 18 which represents the group of pixels which are closest to the camera.It is evident that the segmented regions contain noise induced during the stereo matching and the segmentation process.These results can be further refined to reduce the noisy speckles present.To rectify this, u-disparity image is used which represents every entity in the disparity map as a horizontal line.Similar to the v-disparity map discussed previously, generate the u-disparity image by calculating the histogram of all the pixels present in each column, as given in ( 24) and ( 25), and represent it as an image as shown in Fig. 19.The  disparity image represents each object in the disparity map as a horizontal line.Hence by applying a restriction criterion on the length of the lines present in the u disparity image, a refined version of the input is obtained as shown in Fig. 20.In this manner, perform these operations on all the segments obtained from the k-means segmentation result to detect the obstacles present in the disparity map [78][79].Based on this highlight and segment the obstacles in the images (for purpose of ease of identification, the closest obstacles are highlighted in red bounding boxes, slightly away are in green bounding boxes and farthest away in blue bounding boxes).The final set of obstacles detected in this manner is shown in Fig. 22.Experiments on the threshold value are conducted, which the cluster center is set which in-turn decides how many pixels can be clustered in a group, and have set the value appropriately as shown in Fig. 23.It was observed that for number of clusters values of 6 and 10, the highest obstacle detection rate is obtained but for the latter, the rate of false detections was also higher.For this reason, set the number of clusters as 6.
The proposed methodology was used on the KITTI dataset and its comparison with similar methods is represented in Table III and IV.With the obstacles been clearly detected, verify the above proposed methodologies in real life images.A Kalman filter [80], [81] based path planning algorithm was implemented to verify the use of the proposed algorithms and its ability to detect obstacles in real life environments.The flow chart of the path planning algorithm is presented in the Fig. 24.The state estimation at time  + ∆ is acquired from the data from the previous instance of time t can be calculated as shown in (26) [82].
The variable  ̂ describes the estimated state at time t.It takes the form as mentioned in (27).
where   indicates the vector describing the position and   is the vector related to velocity.Here  ̂+∆ is taken to be the representation of the previous state of the object at time t.The matrix   is called the prediction matrix [83].It uses the older position and velocity to estimate the next possible position and velocity as shown in (28).
The product of  ∆ and  ̂′−∆ is as described in (29).
In other words, it is assumed in the prediction matrix that there is no change in the velocity in the duration from t to  + ∆; this is related to the system dynamics linear assumption [84].The varying magnitude and direction of motion is taken into consideration through the control vector  ⃗  and the control matrix   .In the presented research work, the variation in velocity is treated as the control input.This control input is indicated as a random process which is a contributor to the noise in velocity [85].For this reason, the control inputs are not made use of in the prediction, hence  ⃗  and   are simply not considered and treated as 0 in the equations related to prediction.The matrix   is the covariance matrix and is presented in (30).
The second term   , in the equation related to the updation of the covariance between observed values, is termed as the covariance matrix that is responsible for noise in the process [86], which in this research work keeps track of the variations that would take place during the time interval ∆.
The matrix   is accountable for the conversion to the measurement space from state space.  is taken to be the identity matrix since it can be assumed that the measurement space and the state space can be considered to be identical.The noise in the sensor reading can be represented as the matrix   .This is a diagonal matrix and consists of the variances of normal distributions for the respective dimensions [87].The contribution of the matrix   is in modelling the uncertainty in the readings obtained by the sensor.The expected sensor readings for the covariance matrix   are indicated by the value of        .
A finer approximation of the covariance of the state and the current state can be obtained through the gain function used along with the two mentioned variables as described in (32).
The value represented by the vector   is the reading of the sensor at the time instance t.The vector  ̂′ represents an updated version of the estimate of  ̂ when the sensor readings are considered [88].A similar concept can be applied to ′  .The value indicated by    ̂ is the representation of the expected sensor readings provided the estimated state  ̂|−∆ is given.Subtraction of this component from the actual sensor readings   generates the difference between the observed and the predicted.After this step, the gain function ' is incorporated in to the obtained difference, also known as the error, and this error is appended to  ̂|−∆ to generate  ̂′ , which is the revised version of the estimate of  ̂|−∆ .Lastly, when the KF is used again, the updated values of the estimates  ̂′ and ′  as opposed to  ̂|−∆ and  |−∆ are taken as the values for the initial state equations [89]- [91].
At every frame capture iteration, the system detects the motion, i.e., whether any movement has been made by the robot [92].This is estimated by generating the disparity map for each stereo image pair and carrying out checks to determine if there are any variations in the results.If it is concluded that there has been any form of motion, the latest pose of the robot is calculated, i.e., the co-ordinates after the detected motion and check for any new points of interest (landmarks) [93][94][95].This is accomplished by making use of the calculated disparity map and identifying new points of interest.If any variations in obstacles are detected, the distance of these points are used to initialize and update the Kalman filter.The pose of the robot is recalculate based on this and the process is repeated.The simulation of the path planning algorithm is shown in Fig. 25.The sequence of actions performed by the robot using the mentioned methodologies is presented in Fig. 26 and the mobile robot used is shown in Fig. 27.

IV. CONCLUSION
A novel entropy-based technique is proposed to refine the generated disparity map.Based on the information obtained from these disparity maps, obstacles present in the images are detected and hence avoided.To detect these obstacles, the regions that are not obstacles need to be ignored.This is done with the help of v-disparity image information, which can provide information of the road regions.The regions belonging to the sky can be segmented using HSV information.Once this is carried out, what remains is the points that belong to obstacles.Among these points, the sections that belong to the closest obstacles are highlighted through k means segmentation.The segmented disparity map is further subjected to u-disparity information-based refinement that considers areas of the connected pixels to filter out any noise that was obtained from the k means segmentation.The proposed algorithm generated an average error of 1.33 and had an obstacle detection rate of 91.87% in the KITTI dataset.The proposed methodology was compared with similar methodologies and was found to be in par with them and in some of the cases provided better results.
Furthermore, a Kalman filter based path planning algorithm was used in conjunction with the mentioned algorithms.When implemented in real life scenario, the combination of proposed algorithms and the path planning algorithm was successfully able to guide a mobile robot from source location to a destination location while detecting and avoiding the obstacles present in its path.
Although the proposed methods have generated results that achieve the objectives undertaken by this research work, they have also put forward numerous aspects for further investigation.One such are is to study the effect of multiple light sources.Another aspect would be into the development of cost functions that could be used to generate dense disparity maps from multi modal images for stereo correspondence.This can be useful when dealing in military situations where images from multiple sources (thermal, night vision, etc.) could be available.Work can also be carried out towards the reduction of execution time through the use of graphical processing units or other dedicated hardware for disparity map generation.

Fig. 2 .
Fig. 2. Original image (left) and result of segmenting image into patches (right)

ISSN
, Multi Cost Function Fuzzy Stereo Matching Algorithm for Object Detection and Robot Motion Control

Fig. 8 .
Fig. 8. Left and right stereo images along with the respective disparity maps A comparison of the proposed methodology on KITTI dataset with that of other similar methods are presented in TableII.Fig.9and Fig.10present the entropy images of the left disparity map and the right disparity map obtained from the proposed disparity generation method.Fig.11depicts the effect of the size of the window and threshold value on the average error and the percentage of bad pixels of disparity maps.

Fig. 12 .
Fig. 12. Block diagram of methodology for obstacle detection

Fig. 14 .Fig. 15 .Fig. 16 .
Fig. 14.Converting the disparity map in to v disparity image (a) and detecting the line that would represent the road surface (c) from the refined disparity map (b) .

Fig. 18 .
Fig. 18.An example of a segmented section of the disparity map

Fig. 20 .
Fig. 20.The refined u disparity image These refined u-disparity segments are used to reconstruct the refined version of the segmented disparity map by mapping it back to the original image while using only the points present in the refined u-disparity map.The result of this process is shown in Fig. 21.

Fig. 21 .
Fig. 21.Obstacle disparities extracted from the u disparity image

Fig. 22 .
Fig. 22. Original image with obstacles indicated from the image based on proposed Methodology

Fig. 23 .
Fig. 23.Effect of number of clusters on obstacles detected

TABLE II .
COMPARISON OF PROPOSED METHODOLOGY USING KITTI

TABLE III .
COMPARISON OF OBSTACLE DETECTION RATE OF PROPOSED METHODOLOGY (TRAFFIC CONDITIONS)

TABLE IV .
COMPARISON OF OBSTACLE DETECTION RATE OF PROPOSED METHODOLOGY (URBAN CONDITIONS)

Present Detected Present Detected Present Detected Detection rate by proposed method Detection rate by competing method [54] Detection rate by competing method [55]
(31)d on the observation at time t is decided by the Kalman gain function, which is described in(31).′=      (       +   )−1 Akhil Appu Shetty, Multi Cost Function Fuzzy Stereo Matching Algorithm for Object Detection and Robot Motion Control of change to be incorporated from the prediction at time ,