Applying Deep Learning and Computer Vision to LiDAR Part1: File Sizes

Introduction

I recently had an interesting project where a client wanted to see if certain geographic features could be found by using deep learning on LiDAR converted to GeoTIFF format.  I had already been working on some code to allow me to use any of the Tensorflow built-in models as R-CNNs, so this seemed like the perfect opportunity to try this.  This effort wasn’t without issues, and I thought I would detail them here in case anyone else is interested.  File sizes, lack of training data, and a video card with an add-on fan that sounded like a jet engine turned out to be interesting problems for the project.

I decided to split this up into multiple posts.  Here in Part 1, I will be covering the implications of doing deep learning and computer vision on LiDAR where file sizes can range in the hundreds of gigabytes for imagery.

What is LiDAR?

Example LiDAR point cloud courtesy the United States Geological Survey

LiDAR is a technology that uses laser beams to measure distances and movements in an environment. The word LiDAR comes from Light Detection And Ranging, and it works by sending out short pulses of light and measuring the time it takes for them to bounce back from objects or surfaces. You may have even seen it used on your favorite television show, where people will fly a drone to perform a scan of a particular area.  LiDAR can create high-resolution maps of various terrains, such as forests, oceans, or cities. LiDAR is widely used in applications such as surveying, archaeology, geology, forestry, atmospheric physics, and autonomous driving. 

Archaeologists have made a lot of recent discoveries using LiDAR.  In Central and South America, lost temples and other structures from ancient civilizations such as the Aztecs and the Inca have been found in heavily forested areas.  Drone-based LiDAR can be used to find outlines of hard-to-see foundations where old buildings used to stand.

LiDAR scans are typically stored in a point-cloud format, usually LAS or LAZ or other proprietary and unmentionable formats.  These point clouds can be processed in different ways.  It is common to process them to extract the ground level, tree top level, or building outlines.  This is convenient as these points can be processed for different uses, but not so convenient for visualization.

LiDAR converted to a GeoTIFF DEM

These data are commonly converted into GeoTIFF format, a raster file format, so that they can be used in a GIS.  In this form, they are commonly used as high-resolution digital elevation format (DEM) files.  These files can then be used to perform analysis tasks such as terrain modeling, hydrological modeling, and others.

File Sizes

Conversion to GeoTIFF might result in smaller file sizes and can be easier to process in a GIS, but the files can still be very large.  For this project, the LiDAR file was one hundred and three gigabytes. It was stored as a 32-bit grayscale file so that the elevations of each point on the ground could be stored at a high resolution.  This is still an extremely large file, and not able to be fully loaded into memory for deep learning processing unless a very high-end computer was used (spoiler: I do not have a terabyte of RAM on my home computer).

Using CUDA on a GPU became interesting.  I have a 24 gigabyte used Tesla P40 that I got for cheap off eBay.  Deep learning models can require large amounts of memory that can quickly overwhelm a GPU.  Things like data augmentation, where training images are slightly manipulated on the CPU to provide more samples to help with generalization, take up main memory.  The additional size of the 32-bit dataset and training samples led to more memory being taken up than normal.

Deep learning models tend to require training data to be processed in batches.  These batches are small sets of the input data that are processed during one iteration of training.  It’s also more efficient for algorithms such as stochastic gradient descent to work on batches of data instead of the entire dataset during each iteration.  The sheer size of the training data samples meant that each batch took up a large amount of memory.

Finally, it was impossible to run a detection on the entire LiDAR image at one time.  The image had to be broken up into chunks that could be loaded into memory and run in a decent amount of time.  I made a random choice of cutting the image into an 8×8 grid, resulting in sixty-four images.  I wanted to be able to break up the processing so I could run it and combine the results at the end.  At the time, I had not yet water-cooled my Tesla, so the cooling fan I had attached to it sounded like a jet engine while running.  Breaking it into chunks meant that I could process things during the day and then stop at night when I wanted to sleep.  Believe me, working on other projects during the day while listening to that fan probably made me twitch a bit more than normal.

Conclusion

So that’s it for Part 1. I hope I’ve touched on some of the issues that I came across while trying to processing LiDAR with deep learning and computer vision algorithms. In Part 2 I’ll discuss gathering training data (or the lack of available training data).

How should we be using ChatGPT?

Large-language model (LLM) systems like ChatGPT are all the rage lately and everyone is racing to figure out how to use them. People are screaming that LLMs are going to put them out of jobs, just like the Luddite movement thought so many years ago.

A big problem is that a lot of people do not understand what things like ChatGPT are and how to use them effectively. Things like ChatGPT rely on statistics. They are trained on huge amounts of text and learn patterns from that text. When you ask them a question, they parse through it and then see what patterns they learned that statistically appear to be the most relevant to your input and then generate output. ChatGPT is a tool that can be effective at helping you to get things done, as long as you keep a few things in mind while using it.

You should already know something about your question before you ask.

Nothing is perfect, and neither are large-language models. You should know something about the problem domain so that you can properly interpret the output you get. LLMs can suffer from what is termed hallucination, where they will blissfully answer your question with incorrect and made-up information. Again, their output is based on statistics, and they’re trained on information that has some inherent biases. They do not understand what you are asking like another human would. You need to check the answer to determine if it is correct.

If you are a software developer, this is especially true when asking ChatGPT to write code for you. There are plenty of examples online of people going back and forth with it until they get working code. My own experience is that it has major issues with the Python bindings for GDAL for some reason.

Be clear with what you ask

ChatGPT uses natural language parsing and deep learning to process your request and then try to generate a response that is statistically relevant. Understand that getting good information out of a LLM can be a back and forth, so the clearer you are, the better it can process what you are asking. Do not ask something like “How do I get rich?” and expect working advice.

Be prepared to break down a complex question into smaller parts

You will not have much luck if you ask something like “Tell me how to replace the headers in my engine” and get complete and specific advise. A LLM does not understand the concept of how to do something like this, so it will not be able to give you a complete step-by-step list (unless some automobile company tries to make a specific LLM). Break down complex questions into smaller parts so that you can combine all the information you get at the end.

Tell it when it is wrong

This is probably mainly important for software developers, but do not be afraid to tell ChatGPT when it is wrong. For example, if you ask it to write some source code for you, and it does not work, go back and tell it what went wrong and what the error was. ChatGPT is conversational, so you may have to have a back and forth with it until it gives you information that is correct.

Ask it for clarification

The conversational nature of ChatGPT means that if you do not understand the response, you can ask it to rephrase things or provide more information. This can be helpful if you ask it about a topic you do not understand. Asking for clarification can also help you to judge whether you are getting correct information.

NEVER GIVE IT PERSONAL INFORMATION

Do NOT, under any circumstances, give ChatGPT personal information such as your social security number, your date of birth, credit card numbers, or any other such information. Interactions with LLMs like ChatGPT are used for further training and for tweaking the information it presents. Understand that anything you ask ChatGPT will permanently become part of its training set, so in theory someone can ask it for your personal information and get it if you provide it.

Takeaways

ChatGPT is a very useful tool, and more and more LLMs are being released on an almost weekly basis. Like any tool, you need to understand it before you use it. Keep in mind that it does not understand what you are asking like a human does. It is using a vast pool of training data, learned patterns, and statistics to generate responses that it thinks you want. Always double check what you get out of it instead of blindingly accepting it.

Some Thoughts on Creating Machine Learning Data Sets

My whole professional career (26 years now!) has been in slinging data around. Be it writing production systems for the government or making deep learning pipelines, I have made a living out of manipulating data. I have always taken pains to make sure what I do is correct and am proud of how obsessive I am about putting things together.

Lately I have been back to doing deep learning and computer vision work and have been looking at some of the open datasets that are out there for satellite imagery. Part of this work has involved trying to combine datasets to train models for object classification from various satellite imagery providers.

One thing I have noticed is that it is still hard to make a good deep learning dataset. We are only human, and we miss things sometimes. It is easy to misclassify things or accidentally include images that might not be a good candidate. Even ImageNet, one of the biggest computer vision and deep learning datasets out there, has been found to contain errors in the training data.

This got me thinking about putting together datasets and what “rules of thumb” I would use for doing so. As there do not seem to be as many articles out there about making datasets versus using them, I thought I would add my two cents on making a good machine learning dataset. This is by no means a criticism about the freely-available datasets out there now. In fact, we should all be thankful so many people are putting them out there, and I hope to add to these soon! So, in no particular order, here we go.

Limit the Number of Object Classes

One thing I have noticed is many datasets try to break out their object classes into too fine of detail. This looks to be especially true for satellite datasets. For example, one popular dataset breaks aircraft down into multiple classes ranging from propeller to trainer to jet aircraft. One problem with this approach is that it becomes easy to pick the wrong category while classifying them. Jet trainers are out there. Should that go into the jet category, or the trainer category? There are commercial propeller aircraft out there. What do we do with them?

It is one thing if you are purposely trying to differentiate different types of aircraft from satellite imagery. Even then, however, I would guess that a neural network would learn mostly the same features for each object class, and you would end up getting the aircraft category correct but have numerous misclassifications for the type. It will also be a lot easier to accidentally mislabel things from imagery while you are building the dataset. Now this is different if you are working with imagery to which only certain types of three letter agencies have access. But, most of us have to make due with the types of imagery that are freely available.

Verify Every Dataset Before Use

We all get into a hurry. We have deadlines, our projects are usually underfunded, and we spend large amounts of time putting out fires. It is still vitally important to not blindly use a dataset to train your model! Consider the below image. In one public domain dataset out there, around eight of these images are all labeled as boats.

Image of a car labeled as a boat.

This is a classical example of “we’re only human.” I would wager that a lot of datasets contain errors such as this. It is not that anyone is trying to purposely mislead you. It just happens sometimes. This is why it is essential to go through your dataset no matter how boring it might be. Labeling is a tough job, and there is probably a room in Hell that tortures people by making them label things all day and night. Again, everyone makes mistakes.

Clean up Your Datasets

Some datasets out there are derived from Google Earth. It is a source of high quality imagery and for now the terms seem to not require your firstborn or an oath of loyalty. The problem comes when you include things like the image below in your training set.

Here you can see that someone used an aircraft that had the Google Earth watermark superimposed on top of it. If you only have one or two, it likely will not be an issue. However, if you have a lot of things like this, then your network could learn features from the text and start to expect imagery to have this in it. This is an example of where you should practice due diligence in cleaning up your data before putting it into a machine learning dataset.

Avoid Clusters (Unless Looking for Them)

When extracting objects from Google Earth, you might come across something you consider a gold mine. Say you are extracting and labeling cars from imagery, and you come across a full parking lot. This seems like an easy way to suddenly add a lot of training data, and you might end up with a lot of images such as the one below.

In a dataset I recently worked with, there were several object classes that had a lot of images of the same types of objects right next to each other. If you do this, keep in mind that there could be some side effects from having a lot of clusters in your dataset.

  • It could (possibly) be helpful because your object class can present different views if they are present in a cluster. In the above, you can see the top of one bus and the back of another right beside it. This can aid in learning different features for that class.
  • A negative is that if you have a lot of clusters of objects in your training data, your classifier might lean towards detecting objects in groups instead of individual objects. If you are specifically looking for clusters then this is OK. If you want to search for individual ones, then it could hurt your accuracy.
  • The complexity of your training data could increase and lead to slow-downs during the training process by having more features present for each example object than would be present ordinarily.

In general, it is OK to have some clusters of objects in your data. Just be mindful that if you are looking for individual objects, you should try not to have a lot of clusters in your training data.

Avoid Duplicate Objects

This is true for objects in the same class or between object classes. If you have a lot of duplicates, you can run the risk of over-fitting during training. Problems can also arise if you, say, have a car in both a car object class and in a truck class. In this case you can end up with false detections because you accidentally matched the same features in multiple classes.

Pick Representative Training Data

If you are trying to train a model to detect objects from overhead imagery, you would not want to use a training set of pictures people have taken from their cellular phones. Similarly, if you want to predict someone’s mood from a high resolution camera, you would not want to train using fuzzy webcam images. This is stressed in numerous deep learning texts out there, but it is important to repeat. Check if the data you are using for training matches the data you intend to be running your model on. Fitness for use is important.

Invariance

The last thing I want to discuss here is what I feel is one of the most important aspects of deep learning: invariance. Convolutional neural networks are NOT invariant to scale and rotation. They can detect translations, but you do not get scale or rotational invariance by default. You have to provide this using data augmentation during your training phase.

Consider a dataset of numbers that you are using to train a model. This dataset will likely have the numbers written as you would expect them to be: straight up and down. If you train a model on this, it will detect numbers that match a vertical orientation, but accuracy will go down for anything that is off the vertical axis.

Frameworks like Keras make this easy by providing classes or functions that can randomly rotate, shear, or scale images during training before they are input into the network. This helps the classifier learn features in different orientations, something that is important for classifying overhead imagery.

Conclusion

In summary, these are just some general guidelines I use when I am putting together a dataset. I do not always get everything right, and neither will you. Get multiple eyes looking at your dataset and take your time. Labeling is a laborious, and if your attention drifts it is easy to get wrong. The better quality your training data, the better your model will perform.

More Fun with the RTX 2060

So I recently wiped my system and upgraded to Linux Mint Cinnamon 20. I tend to wipe and install on major releases since I do a lot of customization.

Anyway, I wanted to set CUDA back up along with tensorflow-gpu since I have stuff I wanted to do. I recreated my virtual environment and found Tensorflow 2.2.0 had been released. Based on this I found it still needs CUDA 10.1. No worries, went through and put CUDA 10.1, cuDNN, and TensorRT back on my system and everything was working.

I noticed with 2.2.0 that I was getting the dreaded RTX CUDA_ERROR_OUT_OF_MEMORY errors for pretty much anything I did. So I fixed it and figured I’d post this in case it helps anyone else out down the road. You need to add this in so that the GPU memory can grow and use mixed precision with the RTX (which also helps to run things on the TPUs in the RTX series).

from tensorflow import config as tfc
from tensorflow.keras.mixed_precision import experimental as mixed_precision
...
gpus = tfc.experimental.list_physical_devices("GPU")
tfc.experimental.set_memory_growth(gpus[0], True)
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

If you’re having more out of memory errors on your RTX, give this a shot. You can read more about Tensorflow and mixed precision here.

Fun with Linux and a RTX 2060

Or…. how to spend a day hitting your head into your desk.

Or…. machine learning is easy, right? 🙂

For a while now I have been wanting to upgrade my video card in my desktop so I could actually use it to do machine/deep learning tasks (ML). Since I put together a Frankenstein gaming computer for my daughters out of some older parts, I finally justified getting a new card by saying I would then give them my older nVidia card. After a lot of research, I decided the RTX 2060 was a good balance of how much money I felt like spending versus something that would actually be useful (plus the series comes with dedicated Tensor Cores that work really fast with fp16 data types).

So after buying the card and installing it, the first thing I wanted to do was to get Tensorflow 2.1 to work with it. Now, I already had CUDA 10.2 and the most up-to-date version of TensorRT installed, and knew that I’d have to custom compile Tensorflow’s pip version to work on my system. What I did not know was just how annoying this would turn out to be. My first attempt was to follow the instructions from the Tensorflow web site, including applying this patch that fixes the nccl bindings to work with CUDA 10.2.

However, all of my attempts failed. I had random compiler errors crop up during the build that I have never had before and could not explain. Tried building it with Clang. Tried different versions of GCC. Considered building with a Catholic priest present to keep the demons at bay. No dice. Never could complete a build successfully on my system. This was a bit to be expected since a lot of people online have trouble getting Tensorflow 2.1 and CUDA 10.2 to play nice together.

I finally broke down and downgraded CUDA on my system to 10.1. I also downgraded TensorRT so it would be compatible with the version of CUDA I now had. Finally I could do a pip install tensorflow-gpu inside my virtual environment and and it worked and it ran and I could finally run my training on my GPU with fantastic results.

Almost.

I kept getting CUDNN_STATUS_INTERNAL_ERROR messages every time I tried to run a Keras application on the GPU. Yay. After some Googling, I found this link and apparently there’s an issue with Tensorflow and the RTX line. To fix it, you have to add this to your Python code that uses Keras/Tensorflow:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

...
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

FINALLY! After several days of trying to custom compile Tensorflow for my system, giving up, downgrading so I could install via pip, running into more errors, etc, I now have a working GPU-accelerated Tensorflow! As an example, running the simple additionrnn.py example from Keras, I went from taking around 3.5 seconds per epoch on my Ryzen 7 2700X processor (where I had compiled Tensorflow for CPU only to take advantage of the additional CPU instructions) to taking under 0.5 seconds on the GPU. I’m still experimenting, and modifying some things to use fp16 so I can take advantage of the Tensor Cores in the GPU.