{"id":729,"date":"2024-03-31T13:00:53","date_gmt":"2024-03-31T17:00:53","guid":{"rendered":"https:\/\/brian.digitalmaddox.com\/blog\/?p=729"},"modified":"2024-03-31T13:04:53","modified_gmt":"2024-03-31T17:04:53","slug":"applying-deep-learning-to-lidar-part-3-algorithms","status":"publish","type":"post","link":"https:\/\/brian.digitalmaddox.com\/blog\/?p=729","title":{"rendered":"Applying Deep Learning to LiDAR Part 3: Algorithms"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Last time I talked about the problems finding data and in training a machine learning model to classify geologic features from LiDAR.&nbsp; This time I want to talk about how various libraries can (and cannot) handle 32-bit imagery.&nbsp; This actually caused most of the technical issues with the project and required multiple work-arounds.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">OpenCV and RasterIO<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/opencv.org\/\">OpenCV<\/a> is probably the most widely used computer vision library around.\u00a0 It\u2019s a great library, but it\u2019s written to assume that the entire image can be loaded into memory at once.\u00a0 To get around this, I had to use the <a href=\"https:\/\/github.com\/rasterio\/rasterio\">rasterio<\/a> library as it will read on demand and let you easily read in parts of the image at a time.\u00a0 To use it with something like <a href=\"https:\/\/www.tensorflow.org\/\">Tensorflow<\/a>, you have to change the data with some code like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>with rasterio.open(in_file) as src:\n    # Read the data as a 3D array (bands, rows, columns)\n\n    # Convert the data type to float32\n    data = data.astype(numpy.float32)\n\n\u00a0\u00a0\u00a0\u00a0# Transpose the array to match the shape of cv2.imread (rows, columns, bands)\n    data = numpy.transpose(data, (1, 2, 0))\n\n    return data\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Many computer vision algorithms are designed to expect certain types of images, either 8 to 16-bit grayscale or up to 32-bit three channel (such as RGB) images.\u00a0 OpenCV, one of the most popular, is no different in this aspect .\u00a0 The mathematical formulas behind these algorithms have certain expectations as well.\u00a0 Sometimes they can scale to larger numbers of bits, sometimes not.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Finding Areas of Interest<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This actually impacts how we search the image for areas of interest.\u00a0 There are typically two ways to search an image using computer vision: <a href=\"https:\/\/pyimagesearch.com\/2015\/03\/23\/sliding-windows-for-object-detection-with-python-and-opencv\/\">sliding window<\/a> and <a href=\"https:\/\/pyimagesearch.com\/2020\/06\/29\/opencv-selective-search-for-object-detection\/\">selective search<\/a>.\u00a0 A sliding window search is a technique used to detect objects or features within an image by moving a window of a fixed size across the image in a systematic manner. Imagine looking through a small square or rectangular frame that you slide over an image, both horizontally and vertically, inspecting every part of the image through this frame. At each position, the content within this window is analyzed to determine whether it contains the object or feature of interest.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Selective Search is an algorithm used in computer vision for efficient object detection. It serves as a preprocessing step that proposes regions in an image that are likely to contain objects. Instead of evaluating every possible location and scale directly through a sliding window, Selective Search intelligently generates a set of region proposals by grouping pixels based on similarity criteria such as color, texture, size, and shape compatibility.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Selective search is more efficient than a sliding window since it returns only \u201cinteresting\u201d areas of interest versus a huge number of proposals that a sliding window approach uses.\u00a0 Selective search in OpenCV is only designed to work with 24 bit images (ie, RGB images with 8 bits per channel).\u00a0 To use higher-bit data with it, you would have to scale it to 8 bits\/channel.\u00a0 A 32-bit dataset (which includes negative values as these typically indicate no-data areas) can represent 2.15 billion distinct values.\u00a0 To scale to 8 bits per channel, we would also need to convert it from floating point to 8-bit integer values.\u00a0 In this case, we can only represent 256 discrete values.\u00a0 As you can see, this is quite a difference in how many elevations we can differentiate.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s an example of the areas of interest that a sliding window and image pyramid generates.  As you can see, there are a lot of regions of interest that are regularly placed across the image.  <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Windowed-Regions_screenshot_31.03.2024.png\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1024\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Windowed-Regions_screenshot_31.03.2024-768x1024.png\" alt=\"\" class=\"wp-image-739\" srcset=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Windowed-Regions_screenshot_31.03.2024-768x1024.png 768w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Windowed-Regions_screenshot_31.03.2024-225x300.png 225w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Windowed-Regions_screenshot_31.03.2024.png 1024w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">However, selective search is not always perfect.\u00a0 Below is an example where I ran OpenCV 4\u2019s selective search against an image of mine.\u00a0 It generated 9,020 proposed areas to search.\u00a0 I zoomed in to show it did not even show the hawk as a region of interest.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46-1024x576.png\" alt=\"Selective search output run against an image with a hawk.\" class=\"wp-image-730\" srcset=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46-1024x576.png 1024w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46-300x169.png 300w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46-768x432.png 768w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46-500x281.png 500w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-30-46.png 1366w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Here\u2019s a clipped version of the input dataset when viewed in QGIS as a 32-bit DEM.\u00a0 Notice in this case the values range from roughly 1,431 to 1,865.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31-1024x576.png\" alt=\"QGIS with a clip of the original dataset.\" class=\"wp-image-731\" srcset=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31-1024x576.png 1024w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31-300x169.png 300w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31-768x432.png 768w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31-500x281.png 500w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-31.png 1366w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Now here is a version converted to the 8-bit byte format in QGIS.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42-1024x576.png\" alt=\"Same data converted to byte.\" class=\"wp-image-732\" srcset=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42-1024x576.png 1024w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42-300x169.png 300w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42-768x432.png 768w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42-500x281.png 500w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-from-2024-03-31-11-02-42.png 1366w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see, there is quite a difference between the two files.&nbsp; And before you ask, int8 just results in a black image no matter how I try to adjust the no-data value.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tensorflow tf.data Pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">So to run this, I set up a Tensorflow <a href=\"https:\/\/www.tensorflow.org\/guide\/data\">tf.data pipeline<\/a> for processing.\u00a0 My goal was to be able to turn any of the built-in Tensorflow models into a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Region_Based_Convolutional_Neural_Networks\">RCNN<\/a>.\u00a0 An interesting artifact of using built-in models, Tensorflow, and OpenCV was that the input data actually had to be converted into RGB format.\u00a0 Yes, this means a 32-bit grayscale image had to become a 32-bit RGB image, which of course greatly increased the memory requirements.\u00a0 Here\u2019s a code snippet that shows how to use Rasterio, <a href=\"https:\/\/python-pillow.org\/\">PIL<\/a>, and numpy to take an input image and convert it so it\u2019s compatible with the built-in Tensorflow models:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def load_and_preprocess_32bit_image(image_bytes: tensorflow.string) -&gt; numpy.ndarray:\n    \"\"\"Helper function to preprocess 32-bit TIFF image\n    Args:\n       image_bytes (tensorflow.string): Input image bytes\n\u00a0   Returns:\n        numpy.ndarray: decoded image\n\u00a0\u00a0\u00a0\u00a0\"\"\"\n\n    with rasterio.io.MemoryFile(image_bytes) as memfile:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0with memfile.open() as dataset:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0image = dataset.read()\n\u00a0\u00a0\u00a0\u00a0\n    image = Image.fromarray(image.squeeze().astype('uint32')).convert('RGB')\n\u00a0\u00a0\u00a0\u00a0image = numpy.array(image)\u00a0 # Convert to NumPy array\n\u00a0\u00a0\u00a0\u00a0image = tensorflow.image.resize(image, local_config.IMAGE_SIZE)\n\n\u00a0\u00a0\u00a0\u00a0return image\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This function takes the 32-bit DEM, loads it, converts it to a 32-bit RGB image, and then converts it to a format that Tensorflow can work with.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can then create a function that can use this as part of a tf.data pipeline by defining a function such as this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ndef load_and_preprocess_image_train(image_path, label, in_preprocess_input,\n                                    is_32bit=False):\n    \"\"\" Define a function to load, preprocess, and augment the images\n    Args:\n\u00a0\u00a0\u00a0\u00a0    image_path (_type_): Path to the input image\n\u00a0\u00a0\u00a0\u00a0    label (_type_): label of the image\n\u00a0\u00a0\u00a0\u00a0    in_reprocess_input: Function from keras to call to preprocess the input\n\u00a0\u00a0\u00a0\u00a0    is_32bit (bool, optional): Is the image a 32 bit greyscale. Defaults to \n                                   False.\n\n    Returns:\n\u00a0\u00a0\u00a0\u00a0 _type_: Pre-processed image and label\n    \"\"\"\n\n    image = tensorflow.io.read_file(image_path)\n\n    if is_32bit:\n\u00a0\u00a0\u00a0\u00a0    image = tensorflow.numpy_function(load_and_preprocess_32bit_image, \n                                          &#091;image],\n                                          tensorflow.float32)\n    else:\n\u00a0\u00a0\u00a0\u00a0    image = tensorflow.image.decode_image(image, \n                                              channels=3,\n                                              expand_animations=False)\n\u00a0\u00a0\u00a0\u00a0    image = tensorflow.image.resize(image, local_config.IMAGE_SIZE)\n     \n    image = augment_image_train(image)\u00a0 # Apply data augmentation for training\n    image = in_preprocess_input(image)\n\n    return image, label\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Lastly, this can then be set up as a part of your tf.data pipeline by using code like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Create a tf.data.Dataset for training data\ntrain_dataset = tf.data.Dataset.from_tensor_slices((train_image_paths, train_labels))\ntrain_dataset = \n    train_dataset.map(lambda path, label:\n        image_utilities.load_and_preprocess_image_train(path,\n                                                        label,\n                                                        preprocess_input,\n                                             is_32bit=local_config.USE_TIF,\n                                             num_parallel_calls=tf.data.AUTOTUNE)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">(Yeah trying to format code on a page in WordPress doesn&#8217;t always work so well)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note I plan on making all of the code public once I make sure the client is cool with that since I was already working on it before taking on their project.&nbsp; In the meantime, sorry for being a little bit vague.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Training a Model to be a RCNN<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Once you have your pipeline set up, it is time to load the built-in model.\u00a0 In this case I used <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/applications\/Xception\">Xception<\/a> from Tensorflow and used the pre-trained model to do transfer learning by the standard omit the top layer, freeze the previous layers, then add a new layer on top that learns from the input.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Load the model without pre-trained weights\nbase_model = Xception(weights=local_config.PRETRAINED_MODEL, \n                      include_top=False, \n                      input_shape=local_config.IMAGE_SHAPE,\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0classes=num_classes, input_tensor=input_tensor)\n\n# Freeze the base model layers if we're using a pretrained model\n\nif local_config.PRETRAINED_MODEL is not None:\n\u00a0\u00a0\u00a0\u00a0 for layer in base_model.layers:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 layer.trainable = False\n\n# Add a global average pooling layer\nx = base_model.output\nx = GlobalAveragePooling2D()(x)\n\n# Create the model\npredictions = Dense(num_classes, activation='softmax')(x)\nmodel = Model(inputs=base_model.input, outputs=predictions)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In this case, I used <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/optimizers\/Adam\">Adam<\/a> as the optimizer as it performed better than something like the stock <a href=\"https:\/\/keras.io\/api\/optimizers\/sgd\/\">SGD<\/a> and I added in two model <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/callbacks\/Callback\">callbacks<\/a>.\u00a0 The first saves the model to disk every time the validation accuracy goes up, and the second stops processing if the accuracy hasn\u2019t improved over a preset number of epochs.\u00a0 These are actually built-in to <a href=\"https:\/\/keras.io\/\">Keras<\/a> and can be set up as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># construct the callback to save only the *best* model to disk based on \n# the validation loss\nmodel_checkpoint = ModelCheckpoint(args&#091;\"weights\"], \n                                   monitor=\"val_accuracy\", \n                                   mode=\"max\", \n                                   save_best_only=True,\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0verbose=1)\n\n# Add in an early stopping checkpoint so we don't waste our time\nearly_stop_checkpoint = EarlyStopping(monitor=\"val_accuracy\",\n                                      patience=local_config.EPOCHS_EXIT,\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0restore_best_weights=True)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">You can then add them to a list with<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>model_callbacks = &#091;model_checkpoint, early_stop_checkpoint]<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And then pass that into the <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/Model#fit\">model.fit<\/a> function.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After all of this, it was a matter of running the model.&nbsp; As you can imagine, training took several hours.&nbsp; Since this has gotten a bit long, I think I\u2019ll go into how I did the detection stages next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last time I talked about the problems finding data and in training a machine learning model to classify geologic features from LiDAR.&nbsp; This time I want to talk about how various libraries can (and cannot) handle 32-bit imagery.&nbsp; This actually &hellip; <a href=\"https:\/\/brian.digitalmaddox.com\/blog\/?p=729\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":744,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[37,23,26],"class_list":["post-729","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-computer-vision","tag-gis","tag-machine-learning"],"_links":{"self":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=729"}],"version-history":[{"count":9,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/729\/revisions"}],"predecessor-version":[{"id":745,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/729\/revisions\/745"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/media\/744"}],"wp:attachment":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}