Applying Deep Learning to LiDAR Part 4: Detection

All of the previous parts of this series have talked about the challenges in training a CNN to detect geological features in LiDAR.  This time I will talk about actually running the CNN against the test area and my thoughts on how it went.


I was surprised at how small the actual network was.  The xCeption model that I used ended up only being around 84 megabytes.  Admittedly this was only three classes and not a lot of samples, but I had expected it to be larger.

Next, the test image was a 32-bit single band LiDAR GeoTIFF that was around 350 gigabytes.  This might not sound like much, but when you are scanning it for features, believe me it is quite large.

First off, due to the size of the image, and that I had to use a sliding window scan, I knew that the processing time would be long to run detections.  I did some quick tests on subsections and realized that I would have to break up the image and run detection in chunks.  This was before I had put a water cooler on my Tesla P40, and since I wanted to sleep at night, just letting it run to completion was out of the question.  Sleep was not the only concern I had.  I live south of the capital of the world’s last superpower, yet at the time we lost power any time it got windy or rained.  The small chunks meant that if I lost power, I would not lose everything and could just restart it on the interrupted part.

I decided to break the image up into an 8×8 grid.  This provided a size where each tile could be processed in two to three hours.  I also had to generate strips that covered the edges of the tiles to try to capture features that might span two tiles.  I had no idea how small spatially a feature could be, so I picked a 200×200 minimum pixel size for the sliding window algorithm.  This still meant that each tile would have several thousand potential areas to run detections against.  In the end it took several weeks worth of processing to finish up the entire dataset (keeping in mind that I did not run things at night since it would be hard to sleep with a jet engine next to you).

How well did it work?  Well, that’s the interesting part.  I’m not a geomorphologist so I had to rely on the client to examine it.  But here’s an example of how it looks via QGIS:

Sample of LiDAR detections.

As you can see, it tends to see a lot of areas as floodplain alluvium.  After consulting with the subject matter experts, there are a few things that stand out.

  1. The larger areas are not as useful as the smaller ones.  As I had no idea of a useful scale, I did not have any limitations to the size of the bounding areas to check.  However, it appears the smaller boxes actually do follow alluvium patterns.  The output detections need to be filtered to only keep the smaller areas.
  2. It might be possible to run a clustering algorithm against the smaller areas to better come up with larger areas that are correctly in the class.

Closing thoughts and future work

While mostly successful, as I have had time to look back, I think there are different or better ways to approach this problem.

The first is to train on the actual LiDAR points versus a rasterization of them.  Instead of going all the way to rasterization, I think keeping the points that represent ground level as inputs to training might be a better way to go.  This way I could alleviate the issues with computer vision libraries and potentially have a simpler workflow.  I am curious if geographic features might be easier for a neural network to detect if given the raw points versus a converted raster layer.

If I stay with a rasterized version, I think if I did it again I would try one of the YOLO-class models.  These models are state-of-the-art and I think may work better in scanning large areas for smaller scale features as it does its own segmentation and detection.  The only downside to this is I am not entirely sure YOLO’s segmentation would identify areas better than selective search due to the type of input data.

I think it would also be useful to revisit some of the computer vision algorithms.  I believe selective search could be extended to work with higher numbers of bits per sample.  Some of the other related algorithms could likely be extended.  This would help in general with remotely sensed data as it usually contains higher numbers of bits per sample.

While there are a lot of segmentation models out there, I am curious how well any of them would work with this type of data.  Many of them have the same limitations as OpenCV does and cannot handle 32-bits per sample imagery.  These algorithms typically images where objects “stand out” against the background.  LiDAR in this case is much different than the types of sample data that such images were trained on.  For example, here is a sample of OpenCV’s selective search run against a small section of the test data.  The code of course has to convert the data to 8-bits/sample and convert it to a RGB image before running. Note that this was around 300 meg in size and took over an hour to run on my 16 core Ryzen CPU.

You can see that selective search seems to have trouble with this type of LiDAR as there are not anything such as house lots that could be detected. The detections are a bit all over the place.

Well that’s it for now.  I think my next post will be about another thing I’ve been messing with: applying image saliency algorithms to LiDAR just to see if they’d pull anything out.

Applying Deep Learning to LiDAR Part 3: Algorithms

Last time I talked about the problems finding data and in training a machine learning model to classify geologic features from LiDAR.  This time I want to talk about how various libraries can (and cannot) handle 32-bit imagery.  This actually caused most of the technical issues with the project and required multiple work-arounds.

OpenCV and RasterIO

OpenCV is probably the most widely used computer vision library around.  It’s a great library, but it’s written to assume that the entire image can be loaded into memory at once.  To get around this, I had to use the rasterio library as it will read on demand and let you easily read in parts of the image at a time.  To use it with something like Tensorflow, you have to change the data with some code like this:

with as src:
    # Read the data as a 3D array (bands, rows, columns)

    # Convert the data type to float32
    data = data.astype(numpy.float32)

    # Transpose the array to match the shape of cv2.imread (rows, columns, bands)
    data = numpy.transpose(data, (1, 2, 0))

    return data

Many computer vision algorithms are designed to expect certain types of images, either 8 to 16-bit grayscale or up to 32-bit three channel (such as RGB) images.  OpenCV, one of the most popular, is no different in this aspect .  The mathematical formulas behind these algorithms have certain expectations as well.  Sometimes they can scale to larger numbers of bits, sometimes not.

Finding Areas of Interest

This actually impacts how we search the image for areas of interest.  There are typically two ways to search an image using computer vision: sliding window and selective search.  A sliding window search is a technique used to detect objects or features within an image by moving a window of a fixed size across the image in a systematic manner. Imagine looking through a small square or rectangular frame that you slide over an image, both horizontally and vertically, inspecting every part of the image through this frame. At each position, the content within this window is analyzed to determine whether it contains the object or feature of interest.

Selective Search is an algorithm used in computer vision for efficient object detection. It serves as a preprocessing step that proposes regions in an image that are likely to contain objects. Instead of evaluating every possible location and scale directly through a sliding window, Selective Search intelligently generates a set of region proposals by grouping pixels based on similarity criteria such as color, texture, size, and shape compatibility.

Selective search is more efficient than a sliding window since it returns only “interesting” areas of interest versus a huge number of proposals that a sliding window approach uses.  Selective search in OpenCV is only designed to work with 24 bit images (ie, RGB images with 8 bits per channel).  To use higher-bit data with it, you would have to scale it to 8 bits/channel.  A 32-bit dataset (which includes negative values as these typically indicate no-data areas) can represent 2.15 billion distinct values.  To scale to 8 bits per channel, we would also need to convert it from floating point to 8-bit integer values.  In this case, we can only represent 256 discrete values.  As you can see, this is quite a difference in how many elevations we can differentiate. 

Here’s an example of the areas of interest that a sliding window and image pyramid generates. As you can see, there are a lot of regions of interest that are regularly placed across the image.

However, selective search is not always perfect.  Below is an example where I ran OpenCV 4’s selective search against an image of mine.  It generated 9,020 proposed areas to search.  I zoomed in to show it did not even show the hawk as a region of interest.

Selective search output run against an image with a hawk.

Here’s a clipped version of the input dataset when viewed in QGIS as a 32-bit DEM.  Notice in this case the values range from roughly 1,431 to 1,865.

QGIS with a clip of the original dataset.

Now here is a version converted to the 8-bit byte format in QGIS.

Same data converted to byte.

As you can see, there is quite a difference between the two files.  And before you ask, int8 just results in a black image no matter how I try to adjust the no-data value.

Tensorflow Pipeline

So to run this, I set up a Tensorflow pipeline for processing.  My goal was to be able to turn any of the built-in Tensorflow models into a RCNN.  An interesting artifact of using built-in models, Tensorflow, and OpenCV was that the input data actually had to be converted into RGB format.  Yes, this means a 32-bit grayscale image had to become a 32-bit RGB image, which of course greatly increased the memory requirements.  Here’s a code snippet that shows how to use Rasterio, PIL, and numpy to take an input image and convert it so it’s compatible with the built-in Tensorflow models:

def load_and_preprocess_32bit_image(image_bytes: tensorflow.string) -> numpy.ndarray:
    """Helper function to preprocess 32-bit TIFF image
       image_bytes (tensorflow.string): Input image bytes
        numpy.ndarray: decoded image

    with as memfile:
        with as dataset:
            image =
    image = Image.fromarray(image.squeeze().astype('uint32')).convert('RGB')
    image = numpy.array(image)  # Convert to NumPy array
    image = tensorflow.image.resize(image, local_config.IMAGE_SIZE)

    return image

This function takes the 32-bit DEM, loads it, converts it to a 32-bit RGB image, and then converts it to a format that Tensorflow can work with.  

You can then create a function that can use this as part of a pipeline by defining a function such as this:

def load_and_preprocess_image_train(image_path, label, in_preprocess_input,
    """ Define a function to load, preprocess, and augment the images
        image_path (_type_): Path to the input image
        label (_type_): label of the image
        in_reprocess_input: Function from keras to call to preprocess the input
        is_32bit (bool, optional): Is the image a 32 bit greyscale. Defaults to 

     _type_: Pre-processed image and label

    image =

    if is_32bit:
        image = tensorflow.numpy_function(load_and_preprocess_32bit_image, 
        image = tensorflow.image.decode_image(image, 
        image = tensorflow.image.resize(image, local_config.IMAGE_SIZE)
    image = augment_image_train(image)  # Apply data augmentation for training
    image = in_preprocess_input(image)

    return image, label

Lastly, this can then be set up as a part of your pipeline by using code like this:

# Create a for training data
train_dataset =, train_labels))
train_dataset = path, label:

(Yeah trying to format code on a page in WordPress doesn’t always work so well)

Note I plan on making all of the code public once I make sure the client is cool with that since I was already working on it before taking on their project.  In the meantime, sorry for being a little bit vague.

Training a Model to be a RCNN

Once you have your pipeline set up, it is time to load the built-in model.  In this case I used Xception from Tensorflow and used the pre-trained model to do transfer learning by the standard omit the top layer, freeze the previous layers, then add a new layer on top that learns from the input.

# Load the model without pre-trained weights
base_model = Xception(weights=local_config.PRETRAINED_MODEL, 
                      classes=num_classes, input_tensor=input_tensor)

# Freeze the base model layers if we're using a pretrained model

if local_config.PRETRAINED_MODEL is not None:
     for layer in base_model.layers:
         layer.trainable = False

# Add a global average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)

# Create the model
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)

In this case, I used Adam as the optimizer as it performed better than something like the stock SGD and I added in two model callbacks.  The first saves the model to disk every time the validation accuracy goes up, and the second stops processing if the accuracy hasn’t improved over a preset number of epochs.  These are actually built-in to Keras and can be set up as follows:

# construct the callback to save only the *best* model to disk based on 
# the validation loss
model_checkpoint = ModelCheckpoint(args["weights"], 

# Add in an early stopping checkpoint so we don't waste our time
early_stop_checkpoint = EarlyStopping(monitor="val_accuracy",

You can then add them to a list with

model_callbacks = [model_checkpoint, early_stop_checkpoint]

And then pass that into the function.

After all of this, it was a matter of running the model.  As you can imagine, training took several hours.  Since this has gotten a bit long, I think I’ll go into how I did the detection stages next time.

How should we be using ChatGPT?

Large-language model (LLM) systems like ChatGPT are all the rage lately and everyone is racing to figure out how to use them. People are screaming that LLMs are going to put them out of jobs, just like the Luddite movement thought so many years ago.

A big problem is that a lot of people do not understand what things like ChatGPT are and how to use them effectively. Things like ChatGPT rely on statistics. They are trained on huge amounts of text and learn patterns from that text. When you ask them a question, they parse through it and then see what patterns they learned that statistically appear to be the most relevant to your input and then generate output. ChatGPT is a tool that can be effective at helping you to get things done, as long as you keep a few things in mind while using it.

You should already know something about your question before you ask.

Nothing is perfect, and neither are large-language models. You should know something about the problem domain so that you can properly interpret the output you get. LLMs can suffer from what is termed hallucination, where they will blissfully answer your question with incorrect and made-up information. Again, their output is based on statistics, and they’re trained on information that has some inherent biases. They do not understand what you are asking like another human would. You need to check the answer to determine if it is correct.

If you are a software developer, this is especially true when asking ChatGPT to write code for you. There are plenty of examples online of people going back and forth with it until they get working code. My own experience is that it has major issues with the Python bindings for GDAL for some reason.

Be clear with what you ask

ChatGPT uses natural language parsing and deep learning to process your request and then try to generate a response that is statistically relevant. Understand that getting good information out of a LLM can be a back and forth, so the clearer you are, the better it can process what you are asking. Do not ask something like “How do I get rich?” and expect working advice.

Be prepared to break down a complex question into smaller parts

You will not have much luck if you ask something like “Tell me how to replace the headers in my engine” and get complete and specific advise. A LLM does not understand the concept of how to do something like this, so it will not be able to give you a complete step-by-step list (unless some automobile company tries to make a specific LLM). Break down complex questions into smaller parts so that you can combine all the information you get at the end.

Tell it when it is wrong

This is probably mainly important for software developers, but do not be afraid to tell ChatGPT when it is wrong. For example, if you ask it to write some source code for you, and it does not work, go back and tell it what went wrong and what the error was. ChatGPT is conversational, so you may have to have a back and forth with it until it gives you information that is correct.

Ask it for clarification

The conversational nature of ChatGPT means that if you do not understand the response, you can ask it to rephrase things or provide more information. This can be helpful if you ask it about a topic you do not understand. Asking for clarification can also help you to judge whether you are getting correct information.


Do NOT, under any circumstances, give ChatGPT personal information such as your social security number, your date of birth, credit card numbers, or any other such information. Interactions with LLMs like ChatGPT are used for further training and for tweaking the information it presents. Understand that anything you ask ChatGPT will permanently become part of its training set, so in theory someone can ask it for your personal information and get it if you provide it.


ChatGPT is a very useful tool, and more and more LLMs are being released on an almost weekly basis. Like any tool, you need to understand it before you use it. Keep in mind that it does not understand what you are asking like a human does. It is using a vast pool of training data, learned patterns, and statistics to generate responses that it thinks you want. Always double check what you get out of it instead of blindingly accepting it.

Finally Upgraded!

If you’ve been trying to come here over the past few days, you might have noticed that this blog has been up and down, changing themes, and what not. I have been having issues upgrading the PHP version on this website and finally got things ironed out thanks to my provider’s awesome support staff! So I promise it should be back to normal now. Mostly. Probably. 😉

Image Processing Basics Part 2

Some Examples

Now that we have some of the basics down, let us look at some practical examples of the differences between how the brain sees things versus how a computer does.

Example image of a clear blue sky
Example image of a clear blue sky

The above photo of a part of the sky was taken by my iPhone 13 Pro Max using the native camera application. There were no filters or anything else applied to it. To our eyes, it looks fairly uniform: mainly blue with some lighter blue towards the right where the sun was the day I took the picture. Each pixel of the image represents the light that hit a sensor in the camera, was processed, and saved.

Our brain does not see a number of individual pixels. Instead, we see large splotches of colors. This is one of the shortcuts our brain does to ease the processing burden. If you look around a room, you do not see individual differences between the colors of the wall. Your wall mainly looks like a uniform color. We simply do not have the processing power to break down the inputs from our eyes into every minute part.

A computer, however, does have the ability to “see” an image in all of its different parts. Computers see everything as a number, be it the 1’s and 0’s of binary or color triplets in the RGB color space. If we look at the RGB color cube below, the computer sees all of the pixels in the above image as clustering somewhere around the lower right side of the cube. See the previous link for more information about the RGB color space.

RGB Color cube from wikipedia
RGB Color Cube (Wikimedia Commons contributors, “File:RGB color solid cube.png,” Wikimedia Commons, (accessed April 18, 2023).

In a computer, the above image is loaded and each pixel is in memory in the form of triplets such as (135, 206, 235), which is the code for a color known as sky blue. The computer also does not have to take any shortcuts when it loads the image, meaning that the representation in memory is exactly the same as the image that was saved from the phone.

If we use the OpenCV library to calculate the histogram of the image and then count the number of colors, we in fact find that there are 2,522 unique colors in the picture of the sky. There is no magic here, we just do not have the same precision that a computer does when it comes to examining images or our environment. The big take away here is this: there is more information encoded in pictures or video than what our brains are capable of perceiving. Just because we cannot see certain details in a image does not mean that they are not there.

For another example, consider this image below. The edges look like nothing but black, and all you can really see is out of the window. It is definitely underexposed.

Photo out the window of my wife's grandparents' house.
Photo out the window of my wife’s grandparents’ house.

As mentioned above, a computer is able to detect more than our eyes can. Where we just see black around the edges, there is in fact detail there. We can adjust the exposure on the image to brighten it so that our eyes can see these details.

Above image with the exposure and contrast adjusted
Above image with the exposure and contrast adjusted

With the exposure turned up (and adjusting the contrast as well), we can additionally see a picture of a bird, some dishes, and some cooking implements. This is not magic, nor is it adding anything to the image that was not already there. Image processing like this does not insert things into an image. It only enhances the details of an image so that they are more detectable to the human eye.

Many times, when image processing is in the news, people sometimes assume that it changing an image, or that it is inserting things that were not originally there. When you edit your images on your phone or tablet, you are manipulating the detail that is already in the image. You can enhance the contrast to make the image “pop.” You can change the color tone of the image to make it appear more warm or more cold to your liking. However, this is simply modifying the information that is already in the image to change how it appears to the human eye.

I am making a big deal about this point as future installments in this series will demonstrate how things actually work while hopefully dispelling certain myths that exist in pop culture. I think next time I will cover zooming in or out of an image (aka, resizing). Does it add something into the image or misrepresent it? We will find out.

When Checkinstall Attacks

The other day I was compiling the latest OpenCV on my computer and had planned on doing what I normally do when it’s done: run checkinstall to build a .deb for it because I like to keep all my files under package management. OpenCV finished compiling fairly quickly (it’s nice when you can do a make -j 16) and I then ran checkinstall.

It crashed while it was running and left a half-installed Debian package of OpenCV on my system. “No problem” I thought, I’ll just uninstall the deb and do a normal make install. Sometimes checkinstall crashes so I didn’t think anything was out of the ordinary. Since I usually put it in /opt/opencv4 it would still be self contained at least.

I noticed a little bit later that my system was acting oddly. Some things wouldn’t run, I couldn’t sudo any more, etc. I rebooted as a first check to see if it was just something random going on. And that’s when my system rebooted to a text mode login prompt. “Huh, maybe the card/drivers didn’t initialize fully I’ll just reboot again.” Nope, no joy still the text login.

I tried to login only to watch the process pause after I typed my password, and then came back up the login prompt. “Odd, maybe I’ll see if it’s something weird and try another virtual console.” Nope, no joy there. Tried to ssh into it, no joy there either. I was worried my SSD was going out. It’s not that old, but still a worry.

So I used my laptop to make a bootable Mint installer and plugged that in and tried to boot. The graphics screen was corrupted and had to use safe mode to log in. “Holy crap, is my graphics card messed up along with the hard drive?” I was worried about this because a new power supply I bought a while back had nuked my old motherboard so had to replace hardware in my system. (That’s a story for another day).

I could still get a GUI when I booted into safe mode from the thumb drive so assumed the open source drivers on the latest Mint installer just didn’t like my card unless I did safe mode.. I did a SMART test to make sure nothing was wrong with the drive. That worked so I ran a fsck to check the integrity of the drive. I then went to set up a chroot to the hard drive so I could run debsums to make sure the packages hadn’t gotten randomly corrupted. And then I noticed a problem.

I couldn’t set up the chroot to work. I kept getting an error about /bin/bash not existing. I checked the /bin directory on the hard drive and sure enough, it was empty save for a broken link to some part of the JDK. “That’s odd, there were no drive errors but /bin is empty.” I thought about things for a moment and it randomly did an ls -ld on the root of the hard drive but didn’t see anything at first.

Then hit it me: “Wait a minute, /bin is supposed to be a link to /usr/bin these days.” I realized that for whatever reason, it looked like checkinstall had replaced the link for /bin with an actual /bin and had randomly placed a link in there for the jdk. I deleted the directory and replaced the link to /usr/bin and rebooted. Boom, system booted normally. Well, mostly normally. CUDA had somehow disappeared from the drive and I had to reinstall it (didn’t use the packages from nVidia since they want to downgrade my video drivers so just did a local install). I ran debsums to check and everything verified properly.

The moral of the story is, it’s good to have debugging skills and know how your computer is supposed to work!

Brian vs the Inspiron 620S

On Memorial Day I can say I had a memorable experience while trying to troubleshoot an old computer we still use. My wife got a Dell Inspiron 620S a while ago to use for her work and what not. Over the years I put a bigger hard drive in it and upgraded it to Windows 10. It’s not the fastest computer, but it still works for my wife’s vinyl cutter program that she uses and some software her work uses that’s Windows only. My kids also periodically use it for older games that they like to play since it’s a Core i5 with a decent low-end Radeon card in it.

A few weeks ago it just stopped working. It would not turn on even though the power supply LED was on and the power LED on the motherboard was lit. Just nothing would happen when you pushed the power button. No hard drive spin ups, nothing. So I let it sit for a while.

On Memorial Day I thought I would finally see what was up with it. I took out my multimeter because my first thought was perhaps the power supply was old and wasn’t producing enough power. I checked the ATX motherboard connector and the always-on pin had power and was the right voltage. I also inspected the motherboard to see if perhaps any capacitors had blown but everything looked fine.

I got up in frustration and thought I’d look online. As I got up, my foot came down on something and then slipped which did bad things to my toes and the muscles/ligaments in my foot and also came close to cutting off my middle toe. I thought it was actually a bit fitting because it was like the computer had found a way to flip me off 😉 Thankfully my wife has been helping me keep my toes buddy taped together and have been keeping stuff on the cut to make sure it heals.

I did finally do some Internet searching and found several other owners complaining on the Dell and other forums about the same issue. They had found that unplugging the two front USB ports from the motherboard fixed their issue. No one has any idea what could be going on and of course I haven’t found anything from Dell admitting to it.

On Saturday we decided to take the kids for ice cream (my wife has been driving lately since it’s hard for me to walk let alone drive). I thought before we left I’d hobble down and just see if unplugging those ports would work. I went to the computer (after making sure there was nothing on the floor around the desk), found the ports on the motherboard and unplugged them. And of course, the computer magically started turning on again. I still don’t know why unplugging them works, but I’m also not going to argue.

I then ran into my next problem. It had been a while since anyone used that computer since it hadn’t been working for several weeks and it wasn’t a priority. I randomly could not log in. My password didn’t work, none of the normal passwords I use around the house worked, no joy. My wife and daughter could log in but I couldn’t. I think it was the angry computer gods giving me one last middle finger.

Fortunately I keep a multi-boot USB handy that has a lot of bootable distributions and utilities. I booted into Kali Linux forensics mode. Once booted all I had to do was open up a terminal, switch to the directory on the Windows drive that had the SAM files, and was able to use chntpw to blank out my password. While there I made sure my account hadn’t been locked out or anything like that. I also booted into several antivirus tools and scanned the hard drive just in case there was a reason my password wasn’t working and did a chkdisk on the drive to make repair the errors from when it stopped working after the power outage.

Things are back to normal with it now and I’ve gotten the updates done that it had been missing. I think my next step is a Catholic priest and some Holy Water just in case!

More Fun with the RTX 2060

So I recently wiped my system and upgraded to Linux Mint Cinnamon 20. I tend to wipe and install on major releases since I do a lot of customization.

Anyway, I wanted to set CUDA back up along with tensorflow-gpu since I have stuff I wanted to do. I recreated my virtual environment and found Tensorflow 2.2.0 had been released. Based on this I found it still needs CUDA 10.1. No worries, went through and put CUDA 10.1, cuDNN, and TensorRT back on my system and everything was working.

I noticed with 2.2.0 that I was getting the dreaded RTX CUDA_ERROR_OUT_OF_MEMORY errors for pretty much anything I did. So I fixed it and figured I’d post this in case it helps anyone else out down the road. You need to add this in so that the GPU memory can grow and use mixed precision with the RTX (which also helps to run things on the TPUs in the RTX series).

from tensorflow import config as tfc
from tensorflow.keras.mixed_precision import experimental as mixed_precision
gpus = tfc.experimental.list_physical_devices("GPU")
tfc.experimental.set_memory_growth(gpus[0], True)
policy = mixed_precision.Policy('mixed_float16')

If you’re having more out of memory errors on your RTX, give this a shot. You can read more about Tensorflow and mixed precision here.