When Checkinstall Attacks

Posted on December 6, 2021 by bigbubba

The other day I was compiling the latest OpenCV on my computer and had planned on doing what I normally do when it’s done: run checkinstall to build a .deb for it because I like to keep all my files under package management. OpenCV finished compiling fairly quickly (it’s nice when you can do a make -j 16) and I then ran checkinstall.

It crashed while it was running and left a half-installed Debian package of OpenCV on my system. “No problem” I thought, I’ll just uninstall the deb and do a normal make install. Sometimes checkinstall crashes so I didn’t think anything was out of the ordinary. Since I usually put it in /opt/opencv4 it would still be self contained at least.

I noticed a little bit later that my system was acting oddly. Some things wouldn’t run, I couldn’t sudo any more, etc. I rebooted as a first check to see if it was just something random going on. And that’s when my system rebooted to a text mode login prompt. “Huh, maybe the card/drivers didn’t initialize fully I’ll just reboot again.” Nope, no joy still the text login.

I tried to login only to watch the process pause after I typed my password, and then came back up the login prompt. “Odd, maybe I’ll see if it’s something weird and try another virtual console.” Nope, no joy there. Tried to ssh into it, no joy there either. I was worried my SSD was going out. It’s not that old, but still a worry.

So I used my laptop to make a bootable Mint installer and plugged that in and tried to boot. The graphics screen was corrupted and had to use safe mode to log in. “Holy crap, is my graphics card messed up along with the hard drive?” I was worried about this because a new power supply I bought a while back had nuked my old motherboard so had to replace hardware in my system. (That’s a story for another day).

I could still get a GUI when I booted into safe mode from the thumb drive so assumed the open source drivers on the latest Mint installer just didn’t like my card unless I did safe mode.. I did a SMART test to make sure nothing was wrong with the drive. That worked so I ran a fsck to check the integrity of the drive. I then went to set up a chroot to the hard drive so I could run debsums to make sure the packages hadn’t gotten randomly corrupted. And then I noticed a problem.

I couldn’t set up the chroot to work. I kept getting an error about /bin/bash not existing. I checked the /bin directory on the hard drive and sure enough, it was empty save for a broken link to some part of the JDK. “That’s odd, there were no drive errors but /bin is empty.” I thought about things for a moment and it randomly did an ls -ld on the root of the hard drive but didn’t see anything at first.

Then hit it me: “Wait a minute, /bin is supposed to be a link to /usr/bin these days.” I realized that for whatever reason, it looked like checkinstall had replaced the link for /bin with an actual /bin and had randomly placed a link in there for the jdk. I deleted the directory and replaced the link to /usr/bin and rebooted. Boom, system booted normally. Well, mostly normally. CUDA had somehow disappeared from the drive and I had to reinstall it (didn’t use the packages from nVidia since they want to downgrade my video drivers so just did a local install). I ran debsums to check and everything verified properly.

The moral of the story is, it’s good to have debugging skills and know how your computer is supposed to work!

National/State-Level Census Tiger files updated

Posted on October 11, 2021 by bigbubba

Hot off the computer processing over the weekend, and as a historic first for me only a few days after the Census released them, I’ve updated my GIS data to the 2021 Census Tiger files. There are combined to either the state or national level.

Brian vs the Inspiron 620S

Posted on June 6, 2021 by bigbubba

On Memorial Day I can say I had a memorable experience while trying to troubleshoot an old computer we still use. My wife got a Dell Inspiron 620S a while ago to use for her work and what not. Over the years I put a bigger hard drive in it and upgraded it to Windows 10. It’s not the fastest computer, but it still works for my wife’s vinyl cutter program that she uses and some software her work uses that’s Windows only. My kids also periodically use it for older games that they like to play since it’s a Core i5 with a decent low-end Radeon card in it.

A few weeks ago it just stopped working. It would not turn on even though the power supply LED was on and the power LED on the motherboard was lit. Just nothing would happen when you pushed the power button. No hard drive spin ups, nothing. So I let it sit for a while.

On Memorial Day I thought I would finally see what was up with it. I took out my multimeter because my first thought was perhaps the power supply was old and wasn’t producing enough power. I checked the ATX motherboard connector and the always-on pin had power and was the right voltage. I also inspected the motherboard to see if perhaps any capacitors had blown but everything looked fine.

I got up in frustration and thought I’d look online. As I got up, my foot came down on something and then slipped which did bad things to my toes and the muscles/ligaments in my foot and also came close to cutting off my middle toe. I thought it was actually a bit fitting because it was like the computer had found a way to flip me off 😉 Thankfully my wife has been helping me keep my toes buddy taped together and have been keeping stuff on the cut to make sure it heals.

I did finally do some Internet searching and found several other owners complaining on the Dell and other forums about the same issue. They had found that unplugging the two front USB ports from the motherboard fixed their issue. No one has any idea what could be going on and of course I haven’t found anything from Dell admitting to it.

On Saturday we decided to take the kids for ice cream (my wife has been driving lately since it’s hard for me to walk let alone drive). I thought before we left I’d hobble down and just see if unplugging those ports would work. I went to the computer (after making sure there was nothing on the floor around the desk), found the ports on the motherboard and unplugged them. And of course, the computer magically started turning on again. I still don’t know why unplugging them works, but I’m also not going to argue.

I then ran into my next problem. It had been a while since anyone used that computer since it hadn’t been working for several weeks and it wasn’t a priority. I randomly could not log in. My password didn’t work, none of the normal passwords I use around the house worked, no joy. My wife and daughter could log in but I couldn’t. I think it was the angry computer gods giving me one last middle finger.

Fortunately I keep a multi-boot USB handy that has a lot of bootable distributions and utilities. I booted into Kali Linux forensics mode. Once booted all I had to do was open up a terminal, switch to the directory on the Windows drive that had the SAM files, and was able to use chntpw to blank out my password. While there I made sure my account hadn’t been locked out or anything like that. I also booted into several antivirus tools and scanned the hard drive just in case there was a reason my password wasn’t working and did a chkdisk on the drive to make repair the errors from when it stopped working after the power outage.

Things are back to normal with it now and I’ve gotten the updates done that it had been missing. I think my next step is a Catholic priest and some Holy Water just in case!

State-Based 2020 Census Tiger Data Out

Posted on April 23, 2021 by bigbubba

Yes, it’s April 2021, and I just now got around to repackaging the 2020 Census Tiger Data at state/national levels. If you’re interested, head over to here.

More Fun with the RTX 2060

Posted on July 7, 2020 by bigbubba

So I recently wiped my system and upgraded to Linux Mint Cinnamon 20. I tend to wipe and install on major releases since I do a lot of customization.

Anyway, I wanted to set CUDA back up along with tensorflow-gpu since I have stuff I wanted to do. I recreated my virtual environment and found Tensorflow 2.2.0 had been released. Based on this I found it still needs CUDA 10.1. No worries, went through and put CUDA 10.1, cuDNN, and TensorRT back on my system and everything was working.

I noticed with 2.2.0 that I was getting the dreaded RTX CUDA_ERROR_OUT_OF_MEMORY errors for pretty much anything I did. So I fixed it and figured I’d post this in case it helps anyone else out down the road. You need to add this in so that the GPU memory can grow and use mixed precision with the RTX (which also helps to run things on the TPUs in the RTX series).

from tensorflow import config as tfc
from tensorflow.keras.mixed_precision import experimental as mixed_precision
...
gpus = tfc.experimental.list_physical_devices("GPU")
tfc.experimental.set_memory_growth(gpus[0], True)
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

If you’re having more out of memory errors on your RTX, give this a shot. You can read more about Tensorflow and mixed precision here.

Fun with Linux and a RTX 2060

Posted on March 8, 2020 by bigbubba

Or…. how to spend a day hitting your head into your desk.

Or…. machine learning is easy, right? 🙂

For a while now I have been wanting to upgrade my video card in my desktop so I could actually use it to do machine/deep learning tasks (ML). Since I put together a Frankenstein gaming computer for my daughters out of some older parts, I finally justified getting a new card by saying I would then give them my older nVidia card. After a lot of research, I decided the RTX 2060 was a good balance of how much money I felt like spending versus something that would actually be useful (plus the series comes with dedicated Tensor Cores that work really fast with fp16 data types).

So after buying the card and installing it, the first thing I wanted to do was to get Tensorflow 2.1 to work with it. Now, I already had CUDA 10.2 and the most up-to-date version of TensorRT installed, and knew that I’d have to custom compile Tensorflow’s pip version to work on my system. What I did not know was just how annoying this would turn out to be. My first attempt was to follow the instructions from the Tensorflow web site, including applying this patch that fixes the nccl bindings to work with CUDA 10.2.

However, all of my attempts failed. I had random compiler errors crop up during the build that I have never had before and could not explain. Tried building it with Clang. Tried different versions of GCC. Considered building with a Catholic priest present to keep the demons at bay. No dice. Never could complete a build successfully on my system. This was a bit to be expected since a lot of people online have trouble getting Tensorflow 2.1 and CUDA 10.2 to play nice together.

I finally broke down and downgraded CUDA on my system to 10.1. I also downgraded TensorRT so it would be compatible with the version of CUDA I now had. Finally I could do a pip install tensorflow-gpu inside my virtual environment and and it worked and it ran and I could finally run my training on my GPU with fantastic results.

Almost.

I kept getting CUDNN_STATUS_INTERNAL_ERROR messages every time I tried to run a Keras application on the GPU. Yay. After some Googling, I found this link and apparently there’s an issue with Tensorflow and the RTX line. To fix it, you have to add this to your Python code that uses Keras/Tensorflow:

from tensorflow.compat.v1 import ConfigProto from tensorflow.compat.v1 import InteractiveSession

... config = ConfigProto() config.gpu_options.allow_growth = True session = InteractiveSession(config=config)

FINALLY! After several days of trying to custom compile Tensorflow for my system, giving up, downgrading so I could install via pip, running into more errors, etc, I now have a working GPU-accelerated Tensorflow! As an example, running the simple additionrnn.py example from Keras, I went from taking around 3.5 seconds per epoch on my Ryzen 7 2700X processor (where I had compiled Tensorflow for CPU only to take advantage of the additional CPU instructions) to taking under 0.5 seconds on the GPU. I’m still experimenting, and modifying some things to use fp16 so I can take advantage of the Tensor Cores in the GPU.

Tiger 2019 Data Uploaded

Posted on October 28, 2019 by bigbubba

I’ve uploaded my state-based versions of the Census Tiger 2019 datasets. Use at your own risk, yadda yadda. See the GIS Data page above.

Revisiting Historic Topographic Maps Part 2

Posted on December 2, 2018 by bigbubba

In part one I discussed how I go about finding and downloading maps from the USGS Historic Topomap site. Now I will show how I go through the images I downloaded in Part 1 and determine which ones to keep to make a merged state map. I will also show some things you may run into while making such a map.

Now that the maps are all downloaded, it is time to go through and examine each one to determine what to keep and what to digitally “throw out.” When you download all the historic maps of a certain scale for a state, you will find that each geographic area may have multiple versions that cover it. You will also find that there are some specially made maps that go against the standard quadrangle area and naming convention. The easiest way for me to handle this is to load and examine all the maps inside QGIS.

Using QGIS to Check an Image

I load the maps that cover a quadrangle and overlay them on top of something like Google Maps. For my purposes, I usually try pick maps with the following characteristics:

Oldest to cover an area.
Good visual quality (easy to read, paper map does was not ripped, etc)
Good georeferencing to existing features

QGIS makes it easy to look at all the maps that cover an area. I will typically change the opacity of a layer and see how features such as rivers match existing ones. You will be hard-pressed to find an exact match as some scales are too coarse and these old maps will never match the precision of modern digital ones made from GPS. I also make sure that the map is not too dark and that the text is easily readable.

One thing you will notice with the maps is that names change over time. An example of this is below, where in one map a feature is called Bullock’s Neck and in another it is Bullitt Neck.

Feature Named Bullock’s Neck

Feature Named Bullitt Neck

Another thing you will find with these maps is that the same features are not always in the same spots. Consider the next three images here that cover the same area.

Geographic Area to Check for Registration

First Historic Map to Check Registration

Second Map to Check Registration

If you look closely, you will see that the land features of the map seem to move down between the second and third images. This happens due to how the maps were printed “back in the day.” The maps were broken down into separates where each separate (or plate) contained features of the same color. One contained roads, text, and houses, while another had features such as forests. These separates had stud holes in them so they could be held in place during the printing process. Each separate was inked and a piece of paper was run over each one. Over time these stud holes would get worn so the one or more would move around during printing. Additionally, maps back then were somewhat “works of art,” and could differ between who did the inscribing. Finally, depending on scale and quality of the map, the algorithms to georeference the scanned images can result in rubber sheeting that can further change things.

During my processing, one of the things I use QGIS for is to check which maps register better against modern features. It takes a while using this method but in the end I am typically much happier about the final quality than if I just picked one from each batch at random.

Another thing to check with the historic maps is coverage. Sometimes the map may say it covers a part of the state when it does not.

Map Showing No Actual Virginia Coverage

Here the map showed up in the results list for Virginia, but you can see that the Virginia portion is blank and it actually only contains map information for Maryland.

Finally, you may well find that you do not have images that cover the entire state you are interested in. If you group things by scale and year, the USGS may no longer have the original topomaps for some areas. It could also be that no maps were actually produced for those areas.

Once the images are all selected, the images need to be merged into a single map for an individual state. For my setup I have found that things are easier and faster if I merge them into a GeoTIFF with multiple zoom layers as opposed to storing in PostGIS.

Here I will assume there is a directory of 250K scale files that cover Virginia and that these files have been sorted and the best selected. The first part of this is to merge the individual files into a single file with the command:

gdal_merge.py -o va_250k.tif *.tif

This command may take some time based on various factors. Once finished, the next part is to compress and tile the image. Tiling breaks the image into various parts that can be separately accessed and displayed without having to process the rest of the image. Compression can make a huge difference in file sizes. I did some experimenting and found that a JPEG compression quality of eighty strikes a good balance between being visually pleasing and reducing file size.

gdal_translate -co COMPRESS=JPEG -co TILED=YES -co JPEG_QUALITY=80 va_100k.tif va_100k_tiled.tif

Finally, GeoTIFFs can have reduced-resolution overview layers added to them. The TIFF format supports multiple pages in a file as one of the original uses was to store faxes. A GIS such as QGIS can recognize when a file has overlay views added and will use them first based on how far the user has zoomed. These views usually have much fewer data than the full file and can be quickly accessed and displayed.

gdaladdo --config COMPRESS_OVERVIEW JPEG --config INTERLEAVE_OVERVIEW PIXEL -r average va_100k_tiled.tif 2 4 8 16

With the above command, GDAL will add an overview that is roughly half sized, quarter sized, and so on.

In the end, with tiling and compression, my 250K scale merged map of Virginia comes in at 520 megabytes. QGIS recognizes that the multiple TIFF pages are the various overviews and over my home network loading and zooming is nearly instantaneous. Hopefully these posts will help you to create your own mosaics of historic or even more modern maps.

Tiger 2018 Data Uploaded

Posted on October 4, 2018 by Brian Maddox

I’ve updated my datasets to the 2018 release of the Census Tiger Data. You can find them here.

Manipulating a CSV with Pandas

Posted on August 9, 2018 by bigbubba

At my day job I am working on some natural language processing and need to generate a list of place names so I can further train the excellent spacy library. I previously imported the full Planet OSM so went there to pull a list of places. However, the place names in OSM are typically in the language of the person who did the collection, so they can be anything from English to Arabic. I stored the OSM data using imposm3 and included a PostgreSQL hstore column to store all of the user tags so we would not lose any data. I did a search for all tags that had values like name and en in them and exported those keys and values to several CSV files based on the points, lines, and polygons tables. I thought I would write a quick post to show how easy it can be to manipulate data outside of traditional spreadsheet software.

The next thing I needed to do was some data reduction, so I went to my go-to library of Pandas. If you have been living under a rock and have not heard of it, Pandas is an exceptional data processing library that allows you to easily manipulate data from Python. In this case, I knew some of my data rows were empty and that I would have duplicates due to how things get named in OSM. Pandas makes cleaning data incredibly easy in this case.

First I needed to load the files into Pandas to being cleaning things up. My personal preference for a Python interpreter is ipython/jupyter in a console window. To do this I ran ipython and then imported Pandas by doing the following:

In [1]: import pandas as pd

Next I needed to load up the CSV into Pandas to start manipulating the data.

In [2]: df = pd.read_csv('osm_place_lines.csv', low_memory=False)

At this point, I could examine how many columns and rows I have by running:

In [3]: df.shape
Out[3]: (611092, 20)

Here we can see that I have 611,092 rows and 20 columns. My original query pulled a lot of columns because I wanted to try to capture as many pre-translated English names as I could. To see what all of the column names are, I just had to run:

In [10]: df.columns
Out[10]: 
Index(['name', 'alt_name_1_en', 'alt_name_en', 'alt_name_en_2',
       'alt_name_en_3', 'alt_name_en_translation', 'en_name',
       'gns_n_eng_full_name', 'name_en', 'name_ena', 'name_en1', 'name_en2',
       'name_en3', 'name_en4', 'name_en5', 'name_en6', 'nam_en', 'nat_name_en',
       'official_name_en', 'place_name_en'],
      dtype='object')

The first task I then wanted to do was drop any rows that had no values in them. In Pandas, empty cells default to the NaN value. So to drop all the empty rows, I just had to run:

In [4]: df = df.dropna(how='all')

To see how many rows fell out, I again checked the shape of the data.

In [5]: df.shape
Out[5]: (259564, 20)

Here we can see that the CSV had 351,528 empty rows where the line had no name or English name translations.

Next, I assumed that I had some duplicates in the data. Some things in OSM get generic names, so these can be filtered out since I only want the first row from each duplicate. With no options, drop_duplicates() in Pandas only keeps the first value.

In [6]: df = df.drop_duplicates()

Checking the shape again, I can see that I had 68,131 rows of duplicated data.

In [7]: df.shape
Out[7]: (191433, 20)

At this point I was interested in how many cells in each row still contained no data. The CSV was already sparse since I converted each hstore key into a separate column in my output. To do this, I ran:

In [8]: df.isna().sum()
Out[8]: 
name                          188
alt_name_1_en              191432
alt_name_en                190310
alt_name_en_2              191432
alt_name_en_3              191432
alt_name_en_translation    191432
en_name                    191430
gns_n_eng_full_name        191432
name_en                    191430
name_ena                   172805
name_en1                   191409
name_en2                   191423
name_en3                   191429
name_en4                   191430
name_en5                   191432
name_en6                   191432
nam_en                     191432
nat_name_en                191431
official_name_en           191427
place_name_en              191429
dtype: int64

Here we can see the sparseness of the data. Considering I am now down to 191,433 columns, some of the columns only have a single entry in them. We can also see that I am probably not going to have a lot of English translations to work with.

At this point I wanted to save the modified dataset so I would not loose it. This was a simple

In [8]: df.to_csv('osm_place_lines_nonull.csv', index=False)

The index=False option tells Pandas to not output its internal index field to the CSV.

Now I was curious what things looked like, so I decided to check out the name column. First I increased some default values in Pandas because I did not want it to abbreviate rows or columns.

pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 25)

To view the whole row where the value in a specific column is null, I did the following and I will abbreviate the output to keep the blog shorter 🙂

df[df['name'].isnull()]
...
       name_en                                 name_ena name_en1 name_en2  \
166        NaN                        Orlovskogo Island      NaN      NaN   
129815     NaN                            Puukii Island      NaN      NaN   
159327     NaN                           Ometepe Island      NaN      NaN   
162420     NaN                                  Tortuga      NaN      NaN   
164834     NaN                         Jack Adan Island      NaN      NaN   
191664     NaN                            Hay Felistine      NaN      NaN   
193854     NaN             Alborán Island Military Base      NaN      NaN   
197893     NaN                         Carabelos Island      NaN      NaN   
219472     NaN                           Little Fastnet      NaN      NaN   
219473     NaN                             Fastnet Rock      NaN      NaN   
220004     NaN                           Doonmanus Rock      NaN      NaN   
220945     NaN                             Tootoge Rock      NaN      NaN   
229446     NaN                               Achallader      NaN      NaN   
238355     NaN                            Ulwile Island      NaN      NaN   
238368     NaN                             Mvuna Island      NaN      NaN   
238369     NaN                            Lupita Island      NaN      NaN   
238370     NaN                              Mvuna Rocks      NaN      NaN   
259080     NaN                                  Kafouri      NaN      NaN   
259235     NaN                              Al Thawra 8      NaN      NaN   
259256     NaN                              Beit al-Mal      NaN      NaN   
261584     NaN                                   Al Fao      NaN      NaN   
262200     NaN                                  May 1st      NaN      NaN   
...

Now that I have an idea how things look, I can do things like fill out the rest of the name columns with the English names found the various other columns.

Brian's Geek Blog

My Home on the Internet

Category Archives: Uncategorized