Update root.tex

643e7d93 · Chaoyue Niu · 8dfb0c78 · 643e7d93
Commit 643e7d93 authored Jul 12, 2020 by Chaoyue Niu
--- a/root.tex
+++ b/root.tex
@@ -168,7 +168,7 @@ The data for our forest environment dataset was collected at the Southampton Com



-\subsection{Data description} Our equipped mobile sensor platform was pushed through the forest in the Common in five separate runs during different times of day and weather conditions to account for the resulting variations in lighting conditions (see Table~\ref{tab:env} and select example forest scenes in Figure~\ref{fig:example-frames}). For each run in the forest the following data was recorded from the sensor platform: (i) aligned RGB and depth images from the camera; (ii) 6 DoF IMU linear and angular acceleration of the platform (see Figure~\ref{fig:platform} for axes orientation); (iii) rotary encoder position data; and (iv) GPS location data of the platform.
+\subsection{Data description} Our equipped mobile sensor platform was pushed through the forest in the Common in five separate runs during different times of day and weather conditions to account for the resulting variations in lighting conditions (see Table~\ref{tab:env}) and select example forest scenes in Figure~\ref{fig:example-frames}). For each run in the forest the following data was recorded from the sensor platform: (i) aligned RGB and depth images from the camera; (ii) 6 DoF IMU linear and angular acceleration of the platform (see Figure~\ref{fig:platform} for axes orientation); (iii) rotary encoder position data; and (iv) GPS location data of the platform.

 All the data from the rotary encoder and IMU streams were time synchronized with the recorded images from the camera at 30 frames per second, and recorded at the same rate. The GPS location data was also synchronized with the camera feed, and recorded once per second. Recorded image data was stored lossless in 8-bit PNG file-format at $640 \times 480$ pixel resolution. Data from the IMU, rotary encoder and GPS sensors were stored in an easy to access CSV flat-file structure. Our full data-set comprises all our forest environment recorded data and metadata information, including over $134K$ RGB and depth images. A select sample of our forest data-set containing about 9700 RGB and depth images, and the corresponding time synchronized IMU, rotary encoder and GPS sensor data is available online at \url{https://doi.org/10.5281/zenodo.3693154}.   

@@ -203,7 +203,7 @@ All the data from the rotary encoder and IMU streams were time synchronized with

 \section{Quality of our forest environment dataset}

-To assess the image quality of the depth data in our forest environment dataset we consider, (i) the \textit{fill rate}, which is the percentage of the depth image containing valid pixels (pixels with an estimated depth value), (ii) the depth accuracy using ground truth data, and (iii) the upward-view data 
+To assess the image quality of the depth data in our forest environment dataset we consider, (i) the \textit{fill rate}, which is the percentage of the depth image containing valid pixels (pixels with an estimated depth value), (ii) the depth accuracy using ground truth data, (iii) the upward-view data, and (iv) depth estimation.


 \noindent\textbf{Fill rate of depth images:} In our depth image data, the fill rate may be affected by the movement of the mobile sensor platform through the forest as well as by the luminosity of the scene, influencing exposure and consequently resulting in motion blur effects. For our analysis, the instantaneous velocity and acceleration of the mobile sensor platform was estimated using the rotatory encoder position data. The luminosity or perceived brightness was estimated from the Y luma channel of the RGB converted to YUV color scheme.
@@ -252,34 +252,41 @@ Our analysis suggest a good quality of depth data of the forest environment, wit
 Our results indicate that depth estimated with our mobile sensor platform was close to the ground truth measurements (see Figure~\ref{fig:depth-error}). Across all sampled points P1 to P9, the mean error registered was less than $4\%$. The highest error of $12\%$ was for the point P8, which was positioned furthest from the camera.   
 

-\noindent\textbf{Images in upward view:} There are a few data that looks upward so that the ground cannot be included. We aim to record near obstacles against the far background for the depth information. If camera looks downward too much, the colour gradients shown in the depth map are the lower part of the image, which is near, marked as red, and the upper part of the image, which is far, is marked as blue. In such a case, the depth map cannot be capable of recording obstacles. Thus, we have to adjust the camera during recording so that it can acquire more obstacle depth data. By fusing the IMU data (gyro data and accelerometer data) into orientation, we can estimate the pitch angle of frames. Our analysis suggest frames without ground in view constitute about $1\%$ to $20\%$ of each video. These frames can be filtered out by discarding frames with pitch value $>$ 4 degrees.
+\noindent\textbf{Images in upward view:} We aim to record near obstacles against the far background for the depth information. There are a few data that looks upward so that the ground has not be included. If camera looks downward too much, the colour gradients shown in the depth map are the lower part of the image, which is near, marked as red, and the upper part of the image, which is far, is marked as blue. In such a case, the depth map cannot be capable of recording obstacles. Thus, we adjust the camera before recording so that it can acquire more obstacle depth data. By fusing the IMU data (gyro data and accelerometer data) into orientation, we can estimate the pitch angle of frames, which can be used to locate frames in upward view. Our analysis suggest frames without ground in view constitute about $1\%$ to $20\%$ of each video. These frames can be filtered out by discarding frames with pitch angle value $>$ 4 degrees.

-\section{Depth estimation}



+\noindent\textbf{Ongoing work of depth estimation:} We have been training the deep neural network using this forest dataset for depth estimation by using multi-scale deep network \cite{eigen2014depth} and show that it can work well (see Figure~\ref{fig:depth}). We evaluate the depth estimation result by using some error metrics from \cite{alhashim2018high}, evaluation results are shown in Table~\ref{tab:eva}. These metrics are defined as follows:

-\noindent\textbf{Results:}
+$\bullet$ average relative error (rel): $\frac{1}{n} \sum_{p}^{n} \frac{\left|y_{p}-\hat{y}_{p}\right|}{y}$;

-(see Figure~\ref{fig:depth})
+$\bullet$ root mean squared error (rms): $\sqrt{\left.\frac{1}{n} \sum_{p}^{n}\left(y_{p}-\hat{y}_{p}\right)^{2}\right)}$;

+$\bullet$ average $\left(\log _{10}\right)$ error: $\frac{1}{n} \sum_{p}^{n}\left|\log _{10}\left(y_{p}\right)-\log _{10}\left(\hat{y}_{p}\right)\right|$;


+$\bullet$ threshold accuracy $\left(\delta_{i}\right): \%$ of $y_{p}$ s.t. $\max \left(\frac{y_{p}}{\hat{y}_{p}}, \frac{\hat{y}_{p}}{y_{p}}\right)=$
+$\delta<t h r$ for $t h r=1.25,1.25^{2}, 1.25^{3}$;
+
+
+where $y$: depth ground truth, $y_{p}$: a pixel in $y$; $\hat{y}$: depth estimation image, $\hat{y}_{p}$: a pixel in the $\hat{y}$; $n$: the total number of pixels for each image.
+


 \begin{figure}
 	\centering
 	\includegraphics[width=3.5in]{depth.pdf}
-	\caption{\textbf{Depth estimation results.} }
+	\caption{\textbf{Examples from the depth estimation results.} The image resolution for ground truth and depth prediction has been resized to 64 $\times$  48, the RGB images remain the original resolution of 640 $\times$  480.}
 	\label{fig:depth}
 \end{figure} 

-\noindent\textbf{Evaluation metrics:}


 \begin{table}[ht]
 	\centering
-	\caption{Evaluation metrics.}
+	\caption{\textbf{Evaluation metrics.} $\uparrow$ denotes higher is better, $\downarrow$ denotes lower is better.}
+	\label{tab:eva}
 	\begin{tabular}[t]{cccccc}
 		\hline
 		$\delta$$_1$$\uparrow$&$\delta$$_2$$\uparrow$&$\delta$$_3$$\uparrow$&$rel$$\downarrow$&$rms$$\downarrow$&$log10$$\downarrow$\\
@@ -295,7 +302,9 @@ Our results indicate that depth estimated with our mobile sensor platform was cl

 \section{Summary and future work}

-In this paper, we have proposed a calibrated and synchronized off-road forest depth map dataset recording different obstacles, especially for close-range depth data, such as dirt, tree, tree branch, leaf, and bush. The dataset is recorded under different weather conditions, such as partly sunny, scattered clouds, light rain,sunny and mostly clear.  We also measure the quality of the depth map by fill rate and the accuracy of depth data by using a laser emitter. This dataset should be highly useful in the usage of sparse off-road swarms of ground robots. In the future, we are going to train depth estimation model on this dataset.
+In this paper, we have proposed a timestamped, calibrated and synchronized off-road forest depth map dataset recording different obstacles, especially for close-range depth data, such as dirt, tree, tree branch, leaf, and bush. The dataset is recorded under different weather conditions, such as partly sunny, scattered clouds, light rain,sunny and mostly clear.  We also evaluate the quality of the depth map by fill rate and the accuracy based on laser emitter, indicates filtering out frames in upward view, and have trained deep neural network by using this forest dataset and shows depth estimation results and its evaluation metrics. We haven't provided a simulation environment, because simulation models do not exist in this domain. Such a very large dataset we provide is particularly useful, there should be some existing methods matching satellite data to ground situations. This dataset should be highly useful in the usage of sparse off-road swarms of ground robots. 
+
+In the future, We are going to run the deep neural network model on different embedded devices to see the real-time property of depth estimation. Besides, training a deep neural network using steering data derived from the rotary encoder to predict steering action for path detection is also future work. Furthermore, Predicting terrain roughness using point cloud derived from depth map is also helpful for future terrain traversability analysis.
 
 \bibliographystyle{IEEEtran}
 \bibliography{IEEEabrv}