Stupid LIDAR Tricks Finale

I thought I’d finally wrap this up so I can move on to other things.  Since I’ve last posted, I replaced my Jankinator 1000 (nVidia Tesla P40 with a water cooler) and my nVidia RTX 2060 with an Intel Arc A770.  It has 16Gb of VRAM and is actually a pretty fast GPU on my Linux box.

So far I’ve had pretty good luck getting things like TensorFlow and PyTorch working on it, as mentioned in a previous blog post.  The only thing so far that I haven’t gotten to work 100% is Facebook’s Segment Anything Model 2 (SAM2) (which of course is what I mentioned in the last post I wanted to try to use with the LIDAR GeoTIFF).  Basically now down to running out of VRAM although I’m not sure exactly why since I’ve tweaked settings for OpenCL memory allocation on Linux, etc.  I’ve finally given up on that one for now and decided to just use the CPU for processing.

As a refresher, here is the LIDAR GeoTIFF I have been using.

 

 

 

 

SAM2 has the ability to automatically generate masks on input images.  This was of interest to me since I wanted to test it to try to automatically identify areas of interest from LIDAR.  Fortunately, the GitHub repo for SAM2 has a Jupyter notebook that made it easy to run some experiments.

 

 

 

 

With all of the default parameters, we can see that it only identified the lower right corner of the image.  I tweaked a few of the settings and came up with this version:

 

 

 

 

The second image does show some areas highlighted.  It got a grouping of row houses on the bottom of the image.  It also found a few single family houses as well.  However, it also flagged areas where there is really nothing of interest.

Finale

What have we learned from all of this?  Well, LIDAR is hard.  Automatically finding features of interest in LIDAR is also hard.  We can get some decent results using image processing and/or deep learning techniques, but as with anything in the field, we are no where near 100%.

Previously I posted about training a custom RCNN to identify features of interest from GeoTIFF LIDAR.  It was somewhat successful, although, as I said, it needed a lot more training data than I had available.

I do think that over time, people will develop models that do a good job of finding certain areas of interest in LIDAR.  Some fields already have software to find specific features in LIDAR, especially in fields such as archaeology.  And this is probably how things will continue for a while.  We probably will not have a generalized “find everything of interest in this LIDAR image” model or software for a long time.  However, it is possible to train a model to identify specific areas.

I am also working with LIDAR point data as well versus GeoTIFF versions.  Point data is a lot different beast in that it has various classifications of the points after it has been processed.  You can then do things such as extract tree canopy points or bare ground points.  Conversion to a raster necessarily looses information as things have to be interpolated and reduced in order to produce a raster.  I’ll post some things here in the future as while point cloud data can be harder to work with, I think the results are better than what can be obtained via rasters.

Posted in GIS

Stupid LiDAR Tricks Part 2

In this part of the series, I want to go over image salience and how it can be applied to finding “interesting” things in LiDAR.  Image salience (usually used to make salience maps) refers to the ability to identify and highlight the most important or attention-grabbing regions in an image.  It is meant to highlight areas of an image where the human eye would focus first. Salience maps are used to visualize these regions by assigning a salience value to each pixel, indicating its likelihood of being a point of interest. This technique is widely used in computer vision for tasks such as object detection, image segmentation, and visual search.

Background Research

Salience research actually has been going on for decades now.  It began back in the 1950’s where it was a field of psychology and neuroscience that sought to understand how humans perceive and prioritize visual information.  It mainly stayed in the neuroscience and psychology fields until roughly the end of the 1970’s.

In the 1980’s David Marr proposed a computational theory of vision that provided a framework for understanding the stages of how visual systems could process complex scenes.  This can be considered the beginning of trying to recreate how humans prioritized “interesting” parts of an image.  This also can be considered the base upon which later computer science work would be performed.

In the 1990’s the concept of salience maps was proposed to model how the human visual system identifies areas of interest by Itti, Koch, and Neibur.  In 1998 they created one of the first computational models that combined features such as color, intensity, and orientation to calculate areas of interest.  These algorithms added more complex features during the 2000’s.

With the rise of deep learning in the 2010’s, image salience took a turn and began to use CNNs for detection.  By definition, a CNN learns hierarchical features from large datasets and can identify complex patterns in images.  Combined with techniques such as adversarial learning, multi-scale analysis, and attention mechanisms, salience map generation is now more accurate than it has ever been.

CNN Salience Methods

Let us briefly examine how CNNs / deep learning are used in modern times for salience detection:

  • By definition, convolutional layers in a CNN extract local features from an input image.  Early layers in the network capture low-level features such as edges and textures, while later levels in the network capture higher-level features such as actual objects. Multi-scale analysis to process features at different resolutions can also help with salience detection with a CNN.
  • Pooling layers reduce the spatial dimensions of the feature maps.  This makes the computation more efficient and can even provide a form of spatial invariance so that features do not need to be the exact same scale.
  • The final fully-connected layer can then predict the salience map of the image based on the information gathered through the various layers.
  • An encoder-decoder architecture can be used as another extraction mechanism.  Encoders extract features from the image using convolutional layers while gradually reducing the spatial dimensions of the image so that it can increase the depth of the feature maps.
  • Decoders can then reconstruct the salience map from the encoded features.  In this case they may use techniques such as transposed convolutions to upscale the image or “unpooling” to restore the image to the original size.
  • Feature pyramid networks can process an image at multiple scales to gather coarse and fine details and then integrate the information into a final salience map.
  • Finally, generative adversarial networks can be used to produce salience maps by using a generator to create a map and a discriminator to evaluate the quality of the map.  The generator learns to produce more accurate maps over time by attempting to “fool” the discriminator.

Salience Maps

So what is a salience map?  A salience map is a representation that highlights the most important or attention-grabbing regions in an image. It assigns a salience value to each pixel, indicating its likelihood of being a region of interest.  They are the end result of running a salience detector and can be used for:

  • Object detection by finding and localizing objects in an image.
  • Image segmentation by dividing the image into segments or objects based on their salience.
  • Visual search which can be used for things like scene understanding and image retrieval by identifying which areas should have more processing performed.
  • Attention prediction can be used to highlight areas where a person would be most likely to focus their attention.

Why Image Salience?

The last use is what this post is about: automatically finding areas in LiDAR that need to be inspected or to find anomalies in LiDAR.  Imagine you are a large satellite company that collects thousands of images a day.  It would be time consuming for a human to scan all over each image for something of interest.  Salience maps are useful here in that they can help guide a human to places they need to examine.  Potentially, this could be a huge time saver for things like image triage.

LiDAR in raster format provides some challenges, though, for image salience.  For one, LiDAR represents dense, three-dimensional data instead of a normal two-dimensional image.  It requires pre-processing, such as noise reduction and normalization. LiDAR can contain varying point densities and occlusions in the point cloud. This makes LiDAR harder to analyze as we are dealing with a “different” type of image than normal.

Conversion of point data to raster can also make things problematic for salience detection.  LiDAR has several classes, one such class being bare earth.  In most cases, rasterization processes will convert the points to heights based on ground level.  However, in cases of buildings, this would typically have void areas because the laser cannot penetrate a building to find the ground level.   Most tools will fill these voids with a flat ground-level elevation as many people do not wish to see empty areas in their data.  This can make structures on bare earth rasters look similar to things like roads, thus an algorithm might have trouble differentiating the two.

Image Salience and LiDAR Workflow

Since I did not really cover this in the last post, here I will outline a workflow where salience and/or segmentation can be used to help with the processing of large LiDAR datasets (or really any type of raster dataset).

  1. Once the point data has been converted to a raster, salience maps can be generated to identify and extract areas in the imagery that appear to contain meaningful features.
  2. A human can either manually examine the identified areas, or some other complex object detection analysis algorithm can be run against the areas.  This is where the time saving comes into play as only specific parts of the image are examined, not the entire image itself.
  3. Features that are recognized can then be used for higher level tasks, ranging from identifying geographic features to detecting buildings.

Enough talk and history and theory, let us see how these algorithms actually work.  This source can be found at on github under the salience directory.  This time I made a few changes.  I added a config.py to specify some values for the program to avoid having a lot of command line arguments.   I also have copied the ObjectnessTrainedModel from OpenCV into the salience directory for convenience as not all packaging on Linux actually has the model included.

As a reminder, here are the input data sets from the last post (LiDAR and Hill Shade):

First off we will look at the algorithms in the venerable OpenCV package.  OpenCV contains four algorithms for computing salience maps in an image:

  1. Static Saliency Spectral Residual (SFT).  This algorithm works by using the spectral residual of an image’s Fourier transform to generate maps.  It converts the image to the frequency domain by applying the Fourier transform.  It then computes the spectral residual by removing the logarithm of the frequency amplitude spectrum’s average from the logarithm of the amplitude spectrum.  It then performs an inverse Fourier transform to convert the image back into the spatial domain to generate the initial salience map and applies Gaussian filtering to smooth out the maps.
  2. Static Saliency Fine Grained (BMS).  This algorithm uses Boolean maps to simulate how the brain processes an image.  First it performs color quantization on the image to reduce the number of colors so that it can produce larger distinct regions.  It then generates the Boolean maps by thresholding the quantized image at different levels.  Finally, it generates the salience map by combining the various Boolean maps.  Areas that are common across multiple maps are considered to be the salient area of the image.
  3. Motion Salience (ByBinWang).  This is a motion-based algorithm that is used to detect salient areas in a video.  First it calculates the optical flow between consecutive frames to capture the motion information.  It then calculates the magnitude of the motion vectors to find areas with significant movement.  Finally, it generates a salience map by assuming the areas with higher motion magnitudes are the salient parts of the video.
  4. BING Saliencey Detector (BING). This salience detector focuses on predicting the “objectness of image windows, essentially estimating how likely it is that a given window contains an object of interest. It works by learning objectness from a large set of training images using a simple yet effective feature called “Binarized Normed Gradients” (BING).

For our purposes, we will omit the Motion Salience (ByBinWang) method.  It is geared towards videos or image sequences as it calculates motion vectors. 

As this post is already getting long, we will also only look at the OpenCV image processing based methods here.  The next post will take a look at using some of the more modern methods that use deep learning.

Static Saliency Methods

The static salience methods (SFT and BMS) do not produce output bounding boxes around features of an image.  Instead, they produce a floating point image that highlights the important areas of an image.  If you use these, you would normally do something like threshold the images into a binary map so you could find contours, then generate bounding boxes, and so on.

First up is the SFT method.  We will run it now on the LiDAR GeoTIFF.

As you can see, when compared to the above original, SFT considers a good part of the image to be unimportant.  There are some areas highlighted, but they do not seem to match up with the features we would be interested in examining.  Next let us try the hill shade TIFF.

For the hill shade, SFT is a bit all over the place.  It picks up a lot of areas that it thinks should be interesting, but again they do not really match up to the places we would be interested in (house outlines, waterways, etc).

Next we try out the BMS method on the LiDAR GeoTIFF.

You can see that BMS actually did a decent job with the LiDAR image.  Several of the building footprints have edges that are lighter colored and would show up when thresholded / contoured.  The streams are also highlighted in the image.  The roadway and edges at the lower right side of the image are even picked up a bit.

And now BMS run against the hill shade.

The BMS run against the hill shade TIFF is comparable to the run against the LiDAR GeoTIFF.  Edges of the things we would normally be interested in are highlighted in the image.  It does produce smaller highlighted areas on the hill shade versus the original LiDAR.

The obvious downside to these two techniques is that further processing has to be run to produce actual regions of interest.  You would have to threshold the image into a binary image so you could generate contours.  Then you could convert those contours into bounding boxes via other methods.

Object Saliency Method

BING is an actual object detector that uses a trained model to find objects in an image.  While not as advanced as many of the modern methods, it does come from 2014 and can be considered the more advanced detection method available for images in OpenCV.  In the config.py file, you can see that with BING, you also have to specify the path to the model that it uses for detection.

Here we see that BING found larger areas of interest than the static salience methods (SFT and BMS).  While the static methods, especially BMS, did a decent job at detecting individual objects, BING generates larger areas that should be examined.  Finally, let us run BING against the hill shade image.

Again we see that BING detected larger areas than the static methods.  The areas are in fact close to what BING found against the LiDAR GeoTIFF.

Results

What can we conclude from all of this?  First off, as usual, LiDAR is hard.  Image processing methods to determine image salience can struggle with LiDAR as many areas of interest are not clearly delineated against the background like they would be in an image of your favorite pet.  LiDAR converted to imagery can be chaotic and really pushes traditional image processing methods to the extremes.

Of all of the OpenCV methods to determine salience, I would argue that BMS is the most interesting and does a good job even on the original LiDAR vs the hill shade TIFF.  If we go ahead and threshold the BMS LiDAR image, we can see that it does a good job of guiding us to areas we would find interesting in the LiDAR data.

The BING objectness model fares the worst against the test image.  The areas it identifies are large parts of the image.  If it were a bigger piece of data, it would basically say the entire image is of interest and not do a great job helping to narrow down where exactly a human would need to look.  And in a way this is to be expected.  Finding objects in LiDAR imagery is a difficult task considering how different the imagery is versus normal photographs that most models are trained on.  LiDAR does not often provide an easy separation of foreground versus background.  High-resolution data makes this even worse as things like a river bank can have many different elevation levels.

Next time we will look at modern deep learning-based methods.  How will they fare?  Will they be similar to the BING objectness model and just tell us to examine large swaths of the image?  Or will they work similarly to BMS and guide us to more individual areas.  We will find out next time.

Stupid LiDAR Tricks Part 1 (Segmentation)

My last few posts have been about applying machine learning to try to extract geographic objects in LiDAR.  I think now I would like to go in another direction and talk about ways to help us find anything in LiDAR.  There is a lot of information in LiDAR, and sometimes it would be nice to have a computer help us to find areas we need to examine.

In this case I’m not necessarily just talking about machine learning.  Instead, I am discussing algorithms that can examine an image and identify areas that have something “interesting” in them.  Basically, trying to perform object detection without necessarily determining the object’s identity.

For the next few posts, I think I’ll talk about:

I have a GitHub repository where I’ll stick code that I’m using for this series.

Selective Search (OpenCV)

This first post will talk about selective search, in this specific case, selective search from OpenCV.  Selective search is a segmentation technique used to identify potential regions in an image that might contain objects. In the context of object detection, it can help to quickly narrow down areas of interest before running more complex algorithms. It performs:

  1. Segmentation of the Image: The first step in selective search is to segment the image into multiple small segments or regions. This is typically done using a graph-based segmentation method. The idea is to group pixels together that have similar attributes such as color, texture, size, and shape.
  2. Hierarchical Grouping: After the initial segmentation, selective search employs a hierarchical grouping strategy to merge these small regions into larger ones. It uses a variety of measures to decide which regions to merge, such as color similarity, texture similarity, size similarity, and shape compatibility between the regions. This process is repeated iteratively, resulting in a hierarchical grouping of regions from small to large.
  3. Generating Region Proposals: From this hierarchy of regions, selective search generates region proposals. These proposals are essentially bounding boxes of areas that might contain objects.
  4. Selecting Between Speed and Quality: Selective search allows for configuration between different modes that trade off between speed and the quality (or thoroughness) of the region proposals. “Fast” mode, for example, might be useful in cases of real-time segmentation in videos.  “Quality” is used when processing speed is less important than accuracy.

Additionally. OpenCV allows you to apply various “strategies” to modify the region merging and proposal process.  These strategies are:

  1. Color Strategy: This strategy uses the similarity in color to merge regions. The color similarity is typically measured using histograms of the regions. Regions with similar colors are more likely to be merged under this strategy. This is useful in images where color is a strong indicator of distinct objects.
  2. Texture Strategy: Texture strategy focuses on the texture of the regions. Textures are usually analyzed using local binary patterns or gradient orientations, and regions with similar texture patterns are merged. This strategy is particularly useful in images where texture provides significant information about the objects, such as in natural scenes.
  3. Size Strategy: The size strategy prioritizes merging smaller regions into bigger ones. The idea is to prevent over-segmentation by reducing the number of very small, likely insignificant regions. This strategy tries to control the sizes of the region proposals, balancing between small regions with no areas of interest to large areas that contain multiple areas of interest.
  4. Fill Strategy: This strategy considers how well a region fits within its bounding box. It merges regions that together can better fill a bounding box, minimizing the amount of empty space. The fill strategy is effective in creating more coherent region proposals, especially for objects that are close to being rectangular or square.

Selective Search in Action

Now let us take a look at how selective search works.  This image is of a local celebrity called Gary the Goose.  To follow along, see the selective_search.py code under the selective_search directory in the above GitHub repository.

Gary the Goose

Now let us see how selective search worked on this image:

Selective search on image with all strategies applied.

For this run, selective search was set to quality mode and had all of the strategies applied to it.  As you can see, it found some areas of interest.  It got some of the geese, a street sign, and part of a truck.  But it did not get everything, including the star of the picture.  Now let us try it again, but without applying any of the strategies (comment out line 95).

Default selective search with no strategies applied.

Here we see it did about the same.  Got closer to the large white goose, but still seems to not have picked up a lot in the image.

Selective Search on LiDAR

Now let us try it on a small LiDAR segment.  Here is the sample of a townhome neighborhood.

Small LiDAR clip in QGIS

And here is the best result I could get after running selective search:

Selective search run against LiDAR

As you can see, it did “ok”.  It identified a few areas, but did not pick up on the houses or the small creeks that run through the neighborhood. 

Selective Search on a Hill Shade

Can we do better?  Let us first save the same area as a hillshade GeoTIFF.  Here we take the raw image and apply rendering techniques that simulate how light and shadows would interact with the three dimensional surface, making topographic features in the image easier to see.  You can click some of the links to learn more about it.  Here is the same area where I used QGIS to create and export a hill shade image.

LiDAR as a hill shade.

You can see that the hill shade version makes it easier for a human to pick out features versus the original.  It is easier to spot creeks and the flat areas where buildings are.  Now let us see how selective search handles this file.

Selective Search run against a hill shade.

It did somewhat better.  It identified several of the areas where houses are located, but it still missed all of the others.  It also did not pick up on the creeks that run through the area.

Why Did It Not Work So Well?

Now the question you might have is “Why did selective search do so badly in all of the images?”  Well, this type of segmentation is not actually what we would define as object detection today.  It’s more an image processing operation that builds on techniques that have been around for decades that make use of pixel features to identify areas.

Early segmentation methods that led to selective search typically did the following:

  1. Thresholding: Thresholding segments images based on pixel intensity values. This could be a global threshold applied across the entire image or adaptive thresholds that vary over different sized image regions.
  2. Edge Detection: Edge detectors work by identifying boundaries of objects based on discontinuities in pixel intensities, which often correspond to edges.  Some include a pass to try to connect edges to better identify objects.
  3. Region Growing: This method starts with seed points and “grows” regions by appending neighboring pixels that have similar properties, such as color or texture.
  4. Watershed Algorithm: The watershed algorithm treats the image’s intensity values as a topographic surface, where light areas are high and dark areas are low. “Flooding” the surface from the lowest points segments the image into regions separated by watershed lines.

Selective search came about as a hybrid approach that combined computer vision-based segmentation with strategies to group things together.  Some of these were similarity measures such as color, texture, size, and fill to merge regions together iteratively.  It then introduced a hierarchical grouping that built segments at multiple scales to try to better capture objects in an image.

These techniques do still have their uses.  For example, they can quickly find objects on things like conveyor belts in a manufacturing setting, where the object stands out against a uniform background.  However, they tend to fail when an image is “complicated”, like LiDAR as an example or a white goose that does not easily stand out against the background.  And honestly, they are not really made to work with complex images, especially with LiDAR. These use cases require something more complex than traditional segmentation.

This is way longer now than I expected, so I think I will wrap this up here.  Next time I will talk about another computer vision technique to identify areas of an interest in an image, specifically, image saliency.

Applying Deep Learning and Computer Vision to Lidar Part 2: Training Data

In part one I described some of the issues I had on a recent project that applied deep learning to geographic feature recognition in LiDAR and the file sizes of such data.  This time I want to talk about training data, how important it is, and how little there is for this type of problem.

One of the most important things in deep learning is having both quality training data and a good amount of such data.  I have actually written a previous post about the importance of quality data that you can read here.  At a bare minimum, you should typically have around one thousand samples of each object class you want to train a model to recognize.  Object classes in this case are geographic feature types that we want to recognize.

Training Data Characteristics

These samples should mirror the characteristics of the data that your model will come across during classification tasks.  With LiDAR in GeoTIFF format, the training data should be similar in resolution (0.5 meters in this case) and bit depth (32-bit) to the area for testing.  There should be variability in the training data.  Convolutional neural networks are NOT rotational invariant, meaning that unless you train your model on samples at different angles, it will not automatically recognize features.  In this case, your training LiDAR features should be rotated at different angles to account for differences in projection or north direction.

Balanced Numbers of Features per Class

Your training data should also be balanced, meaning that each class should have roughly the same number of training images where possible.  You can perform some tasks that we will talk about in a bit to help with this, but generally if your model is unbalanced, it could mistakenly “lean” towards one class more than others.

Related to this, when generating your training data, you should make sure your training and testing data contain samples from each of your object classes. Especially when imbalanced, it is very easy to use something like train_test_split from the sklearn library and have it generate a training set that misses some object classes. You should also shuffle your training data so that the samples from each class are sufficiently randomized and the model does not assume that features will appear in a specific order. To alleviate this, make sure you pass in something like:

… = train_test_split(…, shuffle=True, stratify=labels)

where labels is a list of your object classes. For most cases this will ensure the order of your samples is sufficiently randomized and that your training/testing data contains samples of each object class.

Complexity

Geographic features also have a varying complexity in how they appear in LiDAR.  The same feature can “look” differently based on its size or other characteristics.  Humans will always look like humans, so training on them is fairly simple.  Geographic features can be in the same class but appear differently based on factors such as weathering, erosion, vegetation, and even if it’s a riverbed that is dry part of the year.  This increased complexity means that samples need to have enough variability, even in the same object class, for proper training.

Lack of Training Data for This Project

With all of this out of the way, let us now talk about the issues that faced this particular project.  First of all, there is not a lot of labeled training data for these types of features.  At all.  I used various search engines, ChatGPT, even bought a bucket of KFC so I could try to throw some bones to lead me to training data (although I don’t know voodoo so probably read them wrong).

There is a lot of data about these geographic features out there, but not labeled AND in LiDAR format.  There is an abundance of photographs of these features.  There are paper maps of these features.  There are research papers with drawings of these features.  I even found some GIS data that had polygons of these features, but the matching elevation model was too low of a resolution to be useful.

In the end I was only able to find a single dataset that matched the bit depth and spatial resolution that matched the test data.  There were a couple of problems with this dataset though.  First, it only had three feature classes out of a dozen or more.  Second, the number of samples of each of these three classes were way imbalanced.  It broke down like this:

  • Class 1 – 123 samples
  • Class 2 – 2,214 samples
  • Class 3 – 9,700 samples

Realistically we should have just tried for Classes 2 and 3, but decided to try to use various techniques to help with the imbalance.  Plus, since it was a bit of a research project, we felt it would be interesting to see what would happen.

Data Augmentation

There are a few different methods of data augmentation you can do to add more training samples, especially with raster data.  Data augmentation is a technique where you generate new samples from existing data so that you can enhance your model’s generalization and generate more data for training. The key part of this is making sure what the methods you use do not change the object class of the training sample.

Geometric Transformations

The first thing you can do with raster data is to apply geometric transformations (again, as long as they do not change the object class of your training sample).  Randomly rotating your training images can help with the rotational invariance mentioned above.  You can also flip your images, change their scale, and even crop them as long as the feature in the training sample remains.

You can gain several benefits from applying geometric transformations to your training data. If your features can appear at different sizes, scaling transforms can help your model become better generalized on feature size. With LiDAR data, suppose someone did not generate the scene with North at the top. Here, random rotations can help the model generalize to rotation so it can better detect features regardless of angle.

Spatial Relationships

Regardless of what type of augmentations you apply to LiDAR, you have to be mindful that you do not change the spatial characteristics of the data. Consider color space augmentation, something that is common with other areas of deep learning and computer vision. With LiDAR, modifying the brightness/contrast would actually be changing the elevation and/or reflectance values of the data. In some applications, especially those highly based on reflectance values such as detecting types of vegetation, this might be useful. In high-resolution geography, you could end up altering the training data in such a way that it no longer represents real-world features.

Wrapping Up

I think I will end this one here as it got longer than I expected and I’m tired of typing 😉  Next time I’ll cover issues with image processing libraries and 32-bit LiDAR data.

Applying Deep Learning and Computer Vision to LiDAR Part1: File Sizes

Introduction

I recently had an interesting project where a client wanted to see if certain geographic features could be found by using deep learning on LiDAR converted to GeoTIFF format.  I had already been working on some code to allow me to use any of the Tensorflow built-in models as R-CNNs, so this seemed like the perfect opportunity to try this.  This effort wasn’t without issues, and I thought I would detail them here in case anyone else is interested.  File sizes, lack of training data, and a video card with an add-on fan that sounded like a jet engine turned out to be interesting problems for the project.

I decided to split this up into multiple posts.  Here in Part 1, I will be covering the implications of doing deep learning and computer vision on LiDAR where file sizes can range in the hundreds of gigabytes for imagery.

What is LiDAR?

Example LiDAR point cloud courtesy the United States Geological Survey

LiDAR is a technology that uses laser beams to measure distances and movements in an environment. The word LiDAR comes from Light Detection And Ranging, and it works by sending out short pulses of light and measuring the time it takes for them to bounce back from objects or surfaces. You may have even seen it used on your favorite television show, where people will fly a drone to perform a scan of a particular area.  LiDAR can create high-resolution maps of various terrains, such as forests, oceans, or cities. LiDAR is widely used in applications such as surveying, archaeology, geology, forestry, atmospheric physics, and autonomous driving. 

Archaeologists have made a lot of recent discoveries using LiDAR.  In Central and South America, lost temples and other structures from ancient civilizations such as the Aztecs and the Inca have been found in heavily forested areas.  Drone-based LiDAR can be used to find outlines of hard-to-see foundations where old buildings used to stand.

LiDAR scans are typically stored in a point-cloud format, usually LAS or LAZ or other proprietary and unmentionable formats.  These point clouds can be processed in different ways.  It is common to process them to extract the ground level, tree top level, or building outlines.  This is convenient as these points can be processed for different uses, but not so convenient for visualization.

LiDAR converted to a GeoTIFF DEM

These data are commonly converted into GeoTIFF format, a raster file format, so that they can be used in a GIS.  In this form, they are commonly used as high-resolution digital elevation format (DEM) files.  These files can then be used to perform analysis tasks such as terrain modeling, hydrological modeling, and others.

File Sizes

Conversion to GeoTIFF might result in smaller file sizes and can be easier to process in a GIS, but the files can still be very large.  For this project, the LiDAR file was one hundred and three gigabytes. It was stored as a 32-bit grayscale file so that the elevations of each point on the ground could be stored at a high resolution.  This is still an extremely large file, and not able to be fully loaded into memory for deep learning processing unless a very high-end computer was used (spoiler: I do not have a terabyte of RAM on my home computer).

Using CUDA on a GPU became interesting.  I have a 24 gigabyte used Tesla P40 that I got for cheap off eBay.  Deep learning models can require large amounts of memory that can quickly overwhelm a GPU.  Things like data augmentation, where training images are slightly manipulated on the CPU to provide more samples to help with generalization, take up main memory.  The additional size of the 32-bit dataset and training samples led to more memory being taken up than normal.

Deep learning models tend to require training data to be processed in batches.  These batches are small sets of the input data that are processed during one iteration of training.  It’s also more efficient for algorithms such as stochastic gradient descent to work on batches of data instead of the entire dataset during each iteration.  The sheer size of the training data samples meant that each batch took up a large amount of memory.

Finally, it was impossible to run a detection on the entire LiDAR image at one time.  The image had to be broken up into chunks that could be loaded into memory and run in a decent amount of time.  I made a random choice of cutting the image into an 8×8 grid, resulting in sixty-four images.  I wanted to be able to break up the processing so I could run it and combine the results at the end.  At the time, I had not yet water-cooled my Tesla, so the cooling fan I had attached to it sounded like a jet engine while running.  Breaking it into chunks meant that I could process things during the day and then stop at night when I wanted to sleep.  Believe me, working on other projects during the day while listening to that fan probably made me twitch a bit more than normal.

Conclusion

So that’s it for Part 1. I hope I’ve touched on some of the issues that I came across while trying to processing LiDAR with deep learning and computer vision algorithms. In Part 2 I’ll discuss gathering training data (or the lack of available training data).

Revisiting Historic Topographic Maps Part 1

My first professional job during and after college was working at the US Geological Survey as a software engineer and researcher. My job required me to learn about GIS and cartography, as I would do things from writing production systems to researching distributed processing. It gave me an appreciation of cartography and of geospatial data. I especially liked topographic maps as they showed features such as caves and other interesting items on the landscape.

Recently, I had a reason to go back and recreate my mosaics of some Historic USGS Topomaps. I had originally put them into a PostGIS raster database, but over time realized that tools like QGIS and PostGIS raster can be extremely touchy when used together. Even after multiple iterations of trying out various overview levels and constraints, I still had issues with QGIS crashing or performing very slowly. I thought I would share my workflow in taking these maps, mosaicing them, and finally optimizing them for loading into a GIS application such as QGIS.  Note that I use Linux and leave how to install the prerequisite software as an exercise for the reader.

As a refresher, the USGS has been scanning in old topographic maps and has made them freely available in GeoPDF format here. These maps are available at various scales and go back to the late 1800s. Looking at them shows the progression of the early days of USGS map making to the more modern maps that served as the basis of the USGS DRG program. As some of these maps are over one-hundred years old, the quality of the maps in the GeoPDF files can vary widely. Some can be hard to make out due to the yellowing of the paper, while others have tears and pieces missing.

Historically, the topographic maps were printed using multiple techniques from offset lithographic printing to Mylar separates. People used to etch these separates over light tables back in the map factory days. Each separate would represent certain parts of the map, such as the black features, green features, and so on. While at the USGS, many of my coworkers still had their old tool kits they used before moving to digital. You can find a PDF here that talks about the separates and how they were printed. This method of printing will actually be important later on in this series when I describe why some maps look a certain way.

Process

There are a few different ways to start out downloading USGS historic maps. My preferred method is to start at the USGS Historic Topomaps site.

USGS Historic Maps Search

USGS Historic Maps Search

 

 

 

 

 

 

 

It is not quite as fancy a web interface as the others, but it makes it easier to load the search results into Pandas later to filter and download. For my case, I was working on the state of Virginia, so I selected Virginia with a scale of 250,000 and Historical in the Map Type option. I purposely left Map Name empty and will demonstrate why later.

Topo Map Search

Topo Map Search

 

 

 

 

 

 

Once you click submit, you will see your list of results. They are presented in a grid view with metadata about each map that fits the search criteria. In this example case, there are eighty-nine results for 250K scale historic maps. The reason I selected this version of the search is that you can download the search results in a CSV format by clicking in the upper-right corner of the grid.

Topo Map Search Results

Topo Map Search Results

 

 

 

 

 

 

After clicking Download to Excel (csv) File, your browser will download a file called topomaps.csv. You can open it and see that there is quite a bit of metadata about each map.

Topo Map CSV Results

Topo Map CSV Results

 

 

 

 

 

 

If you scroll to the right, you will find the column we are interested in called Download GeoPDF. This column contains the download URL for each file in the search results.

Highlighted CSV Column

Highlighted CSV Column

 

 

 

 

 

 

For the next step, I rely on Pandas. If you have not heard of it, Pandas is an awesome Python data-analysis library that, among a long list of features, lets you load and manipulate a CSV easily. I usually load it using ipython using the commands in bold below.

bmaddox@sdf1:/mnt/filestore/temp/blog$ ipython3
Python 3.6.6 (default, Sep 12 2018, 18:26:19) 
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas as pd

In [2]: csv = pd.read_csv("topomaps.csv")

In [3]: csv
Out[3]: 
   Series     Version  Cell ID     ...      Scan ID GDA Item ID  Create Date
0    HTMC  Historical    69087     ...       255916     5389860   08/31/2011
1    HTMC  Historical    69087     ...       257785     5389864   08/31/2011
2    HTMC  Historical    69087     ...       257786     5389866   08/31/2011
3    HTMC  Historical    69087     ...       707671     5389876   08/31/2011
4    HTMC  Historical    69087     ...       257791     5389874   08/31/2011
5    HTMC  Historical    69087     ...       257790     5389872   08/31/2011
6    HTMC  Historical    69087     ...       257789     5389870   08/31/2011
7    HTMC  Historical    69087     ...       257787     5389868   08/31/2011
..    ...         ...      ...     ...          ...         ...          ...
81   HTMC  Historical    74983     ...       189262     5304224   08/08/2011
82   HTMC  Historical    74983     ...       189260     5304222   08/08/2011
83   HTMC  Historical    74983     ...       707552     5638435   04/23/2012
84   HTMC  Historical    74983     ...       707551     5638433   04/23/2012
85   HTMC  Historical    68682     ...       254032     5416182   09/06/2011
86   HTMC  Historical    68682     ...       254033     5416184   09/06/2011
87   HTMC  Historical    68682     ...       701712     5416186   09/06/2011
88   HTMC  Historical    68682     ...       701713     5416188   09/06/2011

[89 rows x 56 columns]

In [4]:

As you can see from the above, Pandas loads the CSV in memory along with the column names from the CSV header.

In [6]: csv.columns
Out[6]: 
Index(['Series', 'Version', 'Cell ID', 'Map Name', 'Primary State', 'Scale',
       'Date On Map', 'Imprint Year', 'Woodland Tint', 'Visual Version Number',
       'Photo Inspection Year', 'Photo Revision Year', 'Aerial Photo Year',
       'Edit Year', 'Field Check Year', 'Survey Year', 'Datum', 'Projection',
       'Advance', 'Preliminary', 'Provisional', 'Interim', 'Planimetric',
       'Special Printing', 'Special Map', 'Shaded Relief', 'Orthophoto',
       'Pub USGS', 'Pub Army Corps Eng', 'Pub Army Map', 'Pub Forest Serv',
       'Pub Military Other', 'Pub Reclamation', 'Pub War Dept',
       'Pub Bur Land Mgmt', 'Pub Natl Park Serv', 'Pub Indian Affairs',
       'Pub EPA', 'Pub Tenn Valley Auth', 'Pub US Commerce', 'Keywords',
       'Map Language', 'Scanner Resolution', 'Cell Name', 'Primary State Name',
       'N Lat', 'W Long', 'S Lat', 'E Long', 'Link to HTMC Metadata',
       'Download GeoPDF', 'View FGDC Metadata XML', 'View Thumbnail Image',
       'Scan ID', 'GDA Item ID', 'Create Date'],
      dtype='object')

The column we are interested in is named Download GeoPDF as it contains the URLs to download the files.

In [7]: csv["Download GeoPDF"]
Out[7]: 
0     https://prd-tnm.s3.amazonaws.com/StagedProduct...
1     https://prd-tnm.s3.amazonaws.com/StagedProduct...
2     https://prd-tnm.s3.amazonaws.com/StagedProduct...
3     https://prd-tnm.s3.amazonaws.com/StagedProduct...
4     https://prd-tnm.s3.amazonaws.com/StagedProduct...
5     https://prd-tnm.s3.amazonaws.com/StagedProduct...
6     https://prd-tnm.s3.amazonaws.com/StagedProduct...
7     https://prd-tnm.s3.amazonaws.com/StagedProduct...
                            ...                        
78    https://prd-tnm.s3.amazonaws.com/StagedProduct...
79    https://prd-tnm.s3.amazonaws.com/StagedProduct...
80    https://prd-tnm.s3.amazonaws.com/StagedProduct...
81    https://prd-tnm.s3.amazonaws.com/StagedProduct...
82    https://prd-tnm.s3.amazonaws.com/StagedProduct...
83    https://prd-tnm.s3.amazonaws.com/StagedProduct...
84    https://prd-tnm.s3.amazonaws.com/StagedProduct...
85    https://prd-tnm.s3.amazonaws.com/StagedProduct...
86    https://prd-tnm.s3.amazonaws.com/StagedProduct...
87    https://prd-tnm.s3.amazonaws.com/StagedProduct...
88    https://prd-tnm.s3.amazonaws.com/StagedProduct...
Name: Download GeoPDF, Length: 89, dtype: object

The reason I use Pandas for this step is that it gives me a simple and easy way to extract the URL column to a text file.

In [9]: csv["Download GeoPDF"].to_csv('urls.txt', header=None, index=None)

This gives me a simple text file that has all of the URLs in it.

https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/DC/250000/DC_Washington_255916_1989_250000_geo.pdf
https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/DC/250000/DC_Washington_257785_1961_250000_geo.pdf
https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/DC/250000/DC_Washington_257786_1961_250000_geo.pdf
…
https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/WV/250000/WV_Bluefield_254032_1961_250000_geo.pdf
https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/WV/250000/WV_Bluefield_254033_1957_250000_geo.pdf
https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/WV/250000/WV_Bluefield_701712_1957_250000_geo.pdf
https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/WV/250000/WV_Bluefield_701713_1955_250000_geo.pdf

Finally, as there are usually multiple GeoPDF files that cover the same area, I download all of them so that I can go through and pick the best ones for my purposes. I try to find maps that are around the same data, are easily viewable, are not missing sections, and so on. To do this, I use the wget command and use the text file I created as input like so.

bmaddox@sdf1:/mnt/filestore/temp/blog$ wget -i urls.txt 
--2018-09-23 13:00:41--  https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/HistoricalTopo/PDF/DC/250000/DC_Washington_255916_1989_250000_geo.pdf
Resolving prd-tnm.s3.amazonaws.com (prd-tnm.s3.amazonaws.com)... 52.218.194.10
Connecting to prd-tnm.s3.amazonaws.com (prd-tnm.s3.amazonaws.com)|52.218.194.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32062085 (31M) [application/pdf]
Saving to: ‘DC_Washington_255916_1989_250000_geo.pdf’
…
…

Eventually wget will download all the files to the same directory as the text file. In the next installment, I will continue my workflow as I produce mosaic state maps using the historic topographic GeoPDFs.

Geonames Part 2

Since I needed it as part of my job, I finally got around to finishing up the Geonames scripts in my github repository misc_gis_scripts.  In the geonames subdirectory is a bash script called dogeonames.sh.  Create a PostGIS database, edit the bash file, and then run it and it will download and populate your Geonames database for you.

Note that I’m not using the alternatenamesv2 file that they’re distributing now.  I checked with a hex editor and they’re not actually including all fields on each line, and Postgres will not import a file unless each column is there.  I’ll probably add in a Python file to fix it at some point but not now 🙂

Another Fixed GNIS Dataset

When I went to import the latest GNIS dataset into my local PostGIS database, I found that it contains the same issues I’ve been reporting for the past few years.   You can find my fixed version of the dataset here.

As a disclaimer, while I used to work there, I no longer have any association with the US Geological Survey or the Board of Geographic Names.

For those interested, here is the list of problems I found and fixed:

ID 45605: Duplicate entry for Parker Canyon, AZ. The coordinates are in Sonora, Mexico.
ID 45606: Duplicate entry for San Antonio Canyon, AZ. The coordinates are in Sonora, Mexico.
ID 45608: Duplicate entry for Silver Creek, AZ. The coordinates are in Sonora, Mexico.
ID 45610: Duplicate entry for Sycamore Canyon, AZ. The coordinates are in Sonora, Mexico.
ID 567773: Duplicate entry for Hovey Hill, ME. The coordinates are in New Brunswick, Canada.
ID 581558: Duplicate entry for Saint John River, ME. The coordinates are in New Brunswick, Canada.
ID 768593: Duplicate entry for Bear Gulch, MT.  The coordinates are in Alberta, Canada.
ID 774267: Duplicate entry for Miners Coulee, MT.  The coordinates are in Alberta, Canada.
ID 774784: Duplicate entry for North Fork Milk River, MT.  The coordinates are in Alberta, Canada.
ID 775339: Duplicate entry for Police Creek, MT.  The coordinates are in Alberta, Canada.
ID 776125: Duplicate entry for Saint Mary River, MT.  The coordinates are in Alberta, Canada.
ID 778142: Duplicate entry for Waterton River, MT.  The coordinates are in Alberta, Canada.
ID 778545: Duplicate entry for Willow Creek, MT.  The coordinates are in Alberta, Canada.
ID 798995: Duplicate entry for Lee Creek, MT.  The coordinates are in Alberta, Canada.
ID 790166: Duplicate entry for Screw Creek, MT.  The coordinates are in British Columbia, Canada.
ID 793276: Duplicate entry for Wigwam River, MT.  The coordinates are in British Columbia, Canada.
ID 1504446: Duplicate entry for Depot Creek, WA.  The coordinates are in British Columbia, Canada.
ID 1515954: Duplicate entry for Arnold Slough, WA.  The coordinates are in British Columbia, Canada.
ID 1515973: Duplicate entry for Ashnola River, WA.  The coordinates are in British Columbia, Canada.
ID 1516047: Duplicate entry for Baker Creek, WA.  The coordinates are in British Columbia, Canada.
ID 1517465: Duplicate entry for Castle Creek, WA.  The coordinates are in British Columbia, Canada.
ID 1517496: Duplicate entry for Cathedral Fork, WA.  The coordinates are in British Columbia, Canada.
ID 1517707: Duplicate entry for Chilliwack River, WA.  The coordinates are in British Columbia, Canada.
ID 1517762: Duplicate entry for Chuchuwanteen Creek, WA.  The coordinates are in British Columbia, Canada.
ID 1519414: Duplicate entry for Ewart Creek, WA. The coordinates are in British Columbia, Canada.
ID 1520446: Duplicate entry for Haig Creek, WA. The coordinates are in British Columbia, Canada.
ID 1520654: Duplicate entry for Heather Creek, WA. The coordinates are in British Columbia, Canada.
ID 1521214: Duplicate entry for International Creek, WA. The coordinates are in British Columbia, Canada.
ID 1523541: Duplicate entry for Myers Creek, WA. The coordinates are in British Columbia, Canada.
ID 1523731: Duplicate entry for North Creek, WA. The coordinates are in British Columbia, Canada.
ID 1524131: Duplicate entry for Pack Creek, WA. The coordinates are in British Columbia, Canada.
ID 1524235: Duplicate entry for Pass Creek, WA. The coordinates are in British Columbia, Canada.
ID 1524303: Duplicate entry for Peeve Creek, WA. The coordinates are in British Columbia, Canada.
ID 1525297: Duplicate entry for Russian Creek, WA. The coordinates are in British Columbia, Canada.
ID 1525320: Duplicate entry for Saar Creek, WA. The coordinates are in British Columbia, Canada.
ID 1527272: Duplicate entry for Togo Creek, WA. The coordinates are in British Columbia, Canada.
ID 1529904: Duplicate entry for McCoy Creek, WA. The coordinates are in British Columbia, Canada.
ID 1529905: Duplicate entry for Liumchen Creek, WA. The coordinates are in British Columbia, Canada.
ID 942345: Duplicate entry for Allen Brook, NY. The coordinates are in Quebec, Canada.
ID 949668: Duplicate entry for English River, NY. The coordinates are in Quebec, Canada.
ID 959094: Duplicate entry for Oak Creek, NY. The coordinates are in Quebec, Canada.
ID 967898: Duplicate entry for Trout River, NY. The coordinates are in Quebec, Canada.
ID 975764: Duplicate entry for Richelieu River, VT. The coordinates are in Quebec, Canada.
ID 1458184: Duplicate entry for Leavit Brook, VT. The coordinates are in Quebec, Canada.
ID 1458967: Duplicate entry for Pike River, VT. The coordinates are in Quebec, Canada.
ID 1028583: Duplicate entry for Cypress Creek, ND. The coordinates are in Manitoba, Canada.
ID 1035871: Duplicate entry for Mowbray Creek, ND. The coordinates are in Manitoba, Canada.
ID 1035887: Duplicate entry for Gimby Creek, ND. The coordinates are in Manitoba, Canada.
ID 1035890: Duplicate entry for Red River of the North, ND. The coordinates are in Manitoba, Canada.
ID 1035895: Duplicate entry for Wakopa Creek, ND. The coordinates are in Manitoba, Canada.
ID 1930555: Duplicate entry for Red River of the North, ND. The coordinates are in Manitoba, Canada.
ID 1035882: Duplicate entry for East Branch Short Creek, ND. The coordinates are in Saskatchewan, Canada.
ID 1782010: Duplicate entry for Manitoulin Basin, MI. The coordinates are in Ontario, Canada