Applying Deep Learning and Computer Vision to LiDAR Part1: File Sizes

Introduction

I recently had an interesting project where a client wanted to see if certain geographic features could be found by using deep learning on LiDAR converted to GeoTIFF format.  I had already been working on some code to allow me to use any of the Tensorflow built-in models as R-CNNs, so this seemed like the perfect opportunity to try this.  This effort wasn’t without issues, and I thought I would detail them here in case anyone else is interested.  File sizes, lack of training data, and a video card with an add-on fan that sounded like a jet engine turned out to be interesting problems for the project.

I decided to split this up into multiple posts.  Here in Part 1, I will be covering the implications of doing deep learning and computer vision on LiDAR where file sizes can range in the hundreds of gigabytes for imagery.

What is LiDAR?

Example LiDAR point cloud courtesy the United States Geological Survey

LiDAR is a technology that uses laser beams to measure distances and movements in an environment. The word LiDAR comes from Light Detection And Ranging, and it works by sending out short pulses of light and measuring the time it takes for them to bounce back from objects or surfaces. You may have even seen it used on your favorite television show, where people will fly a drone to perform a scan of a particular area.  LiDAR can create high-resolution maps of various terrains, such as forests, oceans, or cities. LiDAR is widely used in applications such as surveying, archaeology, geology, forestry, atmospheric physics, and autonomous driving. 

Archaeologists have made a lot of recent discoveries using LiDAR.  In Central and South America, lost temples and other structures from ancient civilizations such as the Aztecs and the Inca have been found in heavily forested areas.  Drone-based LiDAR can be used to find outlines of hard-to-see foundations where old buildings used to stand.

LiDAR scans are typically stored in a point-cloud format, usually LAS or LAZ or other proprietary and unmentionable formats.  These point clouds can be processed in different ways.  It is common to process them to extract the ground level, tree top level, or building outlines.  This is convenient as these points can be processed for different uses, but not so convenient for visualization.

LiDAR converted to a GeoTIFF DEM

These data are commonly converted into GeoTIFF format, a raster file format, so that they can be used in a GIS.  In this form, they are commonly used as high-resolution digital elevation format (DEM) files.  These files can then be used to perform analysis tasks such as terrain modeling, hydrological modeling, and others.

File Sizes

Conversion to GeoTIFF might result in smaller file sizes and can be easier to process in a GIS, but the files can still be very large.  For this project, the LiDAR file was one hundred and three gigabytes. It was stored as a 32-bit grayscale file so that the elevations of each point on the ground could be stored at a high resolution.  This is still an extremely large file, and not able to be fully loaded into memory for deep learning processing unless a very high-end computer was used (spoiler: I do not have a terabyte of RAM on my home computer).

Using CUDA on a GPU became interesting.  I have a 24 gigabyte used Tesla P40 that I got for cheap off eBay.  Deep learning models can require large amounts of memory that can quickly overwhelm a GPU.  Things like data augmentation, where training images are slightly manipulated on the CPU to provide more samples to help with generalization, take up main memory.  The additional size of the 32-bit dataset and training samples led to more memory being taken up than normal.

Deep learning models tend to require training data to be processed in batches.  These batches are small sets of the input data that are processed during one iteration of training.  It’s also more efficient for algorithms such as stochastic gradient descent to work on batches of data instead of the entire dataset during each iteration.  The sheer size of the training data samples meant that each batch took up a large amount of memory.

Finally, it was impossible to run a detection on the entire LiDAR image at one time.  The image had to be broken up into chunks that could be loaded into memory and run in a decent amount of time.  I made a random choice of cutting the image into an 8×8 grid, resulting in sixty-four images.  I wanted to be able to break up the processing so I could run it and combine the results at the end.  At the time, I had not yet water-cooled my Tesla, so the cooling fan I had attached to it sounded like a jet engine while running.  Breaking it into chunks meant that I could process things during the day and then stop at night when I wanted to sleep.  Believe me, working on other projects during the day while listening to that fan probably made me twitch a bit more than normal.

Conclusion

So that’s it for Part 1. I hope I’ve touched on some of the issues that I came across while trying to processing LiDAR with deep learning and computer vision algorithms. In Part 2 I’ll discuss gathering training data (or the lack of available training data).