Using Free Geospatial Tools and Data Part 5: Setting up your Geospatial Database

In this installment, we will look at setting up a geospatial database to store your data. This is not a long post, but the first where you can get your feet wet, so to speak. If you are going to do anything more than just look at a data layer or two, you will want to set up a database. Some vector data sets can be huge (several hundred gigabytes in fact), and opening them in a GIS can take forever. Even if you only want to look at a small part of a file, applications will have to scan through all the data to get to the part you want. Some data sets such as Shapefiles allow you to create index files to help locate the data faster, but you will likely want the added functionality that a spatial database gives you.

The biggest advantage of having a spatial database is that it does a better job of indexing and analyzing the data and stores it for faster access. When your application requests an area, the database will send back only the data covering that area, which will greatly speed up access and processing. Databases can physically sort the data so that features close to each other are stored in the same areas on disk as well.

I am going to describe PostGIS here, mainly because it is the geospatial database I use on a regular basis. Most Linux distributions should have a fairly recent version of the software in their repositories. My Fedora 19 system has version 2.0.3 of PostGIS from the Fedora- and 9.2.5 of PostgreSQL from the updates-testing repositories. Based on your distribution, you should find it in your GUI-based package manager or by running commands such as yum search postgis or apt-cache search postgis from the command line. On Windows, head to http://postgis.net/windows_downloads to find an installer for your platform. You can find install information at http://postgis.net/docs/postgis_installation.html. Again, I am not going to duplicate their instructions here. They spent a lot of time writing it up, so you can read how to install there 🙂  Note that I HIGHLY recommend that you use PostgreSQL 9.1 or later. Version 9.1+ makes it much easier to create a spatial database than previous versions and adds some additional speed benefits.

Depending on how PostGIS was packaged for your system, you might already have a template geospatial database installed. If not, you will want to create one using the instructions found in the postgis_installation link above. Basically, if you are using PostGIS with PostgreSQL 9.1 and above, run the following from a command line:

createdb gis_templatedb
psql -d gis_templatedb -c "CREATE EXTENSION postgis;"
psql -d gis_templatedb -c "CREATE EXTENSION postgis_topology;"

Additionally, if you plan on loading an older database table, run:

psql -d gis_templatedb -f legacy.sql

The above commands create a database and load the geospatial functions from PostGIS into it. These functions allow the database to store spatial data and perform operations such as find all the points nearest to this one and let you search for specific data using SQL operations. The reason you want to have a database template is simple: it makes it much easier to create other geospatial databases. With a template you just have to run a command such as createdb -T my_template_database my_gis_data. Otherwise, you would have to run all of the above commands each time.

Once you have followed the installation instructions from the PostGIS website and have a template in place, you are ready to move on to testing it to make sure everything is OK. Download and unzip this  file somewhere on your system. It is a Shapefile I made of four cities in Virginia. We will use it to test your geospatial database and show you how to load data from it into QGIS.

First, create your geospatial database. Run createdb -T gis_templatedb testgeodb to create your first real geospatial database. If you get any errors, first make sure PostgreSQL is running and then head to the troubleshooting sections of the PostgreSQL website. If it worked correctly, the command will simply return as shown below.

createdb command

createdb command

PostGIS comes with two versions of a utility to load Shapefile data: shp2pgsql and shp2pgsql-gui. Respectively, the first runs from the command line and you must pass it options to select items such as the EPSG code and the second is a full GUI that lets you click to select your options. As this is a simple Shapefile, we will use the GUI version so you can get some experience loading data.

Find the shp2pgsql-gui command on your system. On Windows it should be under the PostgreSQL menu entry. Under Linux it should be installed to /usr/bin/shp2pgsql-gui. Currently (11/29/2013), if you are running Fedora 19, the PostGIS RPMS contain a broken copy of shp2pgsql and shp2pgsql-gui. You can click here to download fixed RPMS I made that will update PostGIS on your system. The file is a zipfile with the RPMS. Run unzip postgisfixed.zip from a shell to extract the files and then change to the postgisfixed directory. Run rpm -Fvh * as root from the directory to upgrade your copy of PostGIS.

When you run shp2pgsql-gui, it should look like this:

shp2pgsql-gui

shp2pgsql-gui

The first step is to click the “View connection details” button under the PostGIS Connection label. You will be presented with a window where you can enter in your connection options. Type in your database options and click the OK button. You should see something similar to the below picture showing that your database connection succeeded.

shp2pgsql_dboptions

shp2pgsql_dboptions

Now load the Shapefile into shp2pgsql-gui. Click the Add File button and navigate to where you unzipped the va_points.zip file. Click on the va_points.shp file in the sub-directory and shp2pgsql-gui should look similar to the following screen shot.

shp2pgsqlgui_va_points

shp2pgsqlgui_va_points

Before you click the import button, you will likely need to change a few options. Shp2pgsql/shp2pgsql-gui always default to a 0 for the SRID (Spatial Reference system IDentifier). This number specifies the European Petroleum Survey Group (EPSG) code that denotes the coordinate system of the source data. In the case of va_points.shp, double click under SRID and enter in 4326. This code stands for WGS84, which is the projection I used when I created the Shapefile. Once you have done this, click the Import button. Shp2pgsql-gui will pop up a status window and then give you a Shapefile Import Completed message as shown below. That is it, your data should be in your database. If you get any errors, double check the connection parameters and try again. If it still does not work, check the troubleshooting sections on the PostGIS website.

Next time, we will go over installing QGIS so you can look at the data you just imported into your database.

Posted in GIS

Fun with my My Book World Edition II

After the house fire in 2010, we picked up a My Book World Edition II to house all of our files that we did not want to loose (such as photos, videos of the girls, and so on).  Before the fire, I had a software RAID 5 going in one of the servers I had at the house.  Fortunately, that server was the one the fire department picked to throw out the back window during the fire and it sat out in the water hoses and rain.  The good thing was that I was doing forensics at the FBI then so I was able to recover data off the drives (all Western Digital) and transferred them to the My Book.

A week or so ago, I had to power off the My Book to take care of something.  After I turned it back on I noticed that the LED on the front had the ominous “something is wrong with me” flash going on.  I logged in and it didn’t see one of the drives at all and claimed it was missing.  ssh’d into the drive and sure enough the drive was not being detected by the kernel when it booted.

I took the drive out and put it in a USB enclosure to see what was wrong.  The SMART status said it was OK until I tried to do an extended SMART test of the drive.  It erred out with a bad sector.  Out of curiosity I ran badblocks against it to see how many sectors were bad in case I could try to coax it back to life.  A day and a half later (was a 2TB drive), badblocks finished but I noticed it wouldn’t list all of the bad sectors it had found on the drive.  I poked around some more and eventually found that badblocks had recorded 30 GIGABYTES of bad sectors.  Not K, not Meg, but gig.  OK drive is dead.

I had another 2TB drive in the server that I was using for temp storage when I would process things.  Also had my virtual machines and PostGIS database on it.  I figured I could use that as the new drive B in the RAID so I backed up everything from the remaining drive in the My Book and on the drive in the server to another USB drive, cleared the partition table, and put it into the My Book.

The expected behavior when you do this is that when you turn the My Book back on, it will see the new drive and rebuild the RAID to the drive as I had the My Book running RAID 1.  No such luck in my case.  At this point it would just sit and not finish booting (the My Book Worlds are embedded Linux devices for the uninitiated).  I finally found this article online that had a recovery script that would wipe a drive, download the WD firmware, and re-image the drive.  Took both drives out of the WD, re-imaged the working A drive, and put it back in.  Finally it successfully booted again and thought itself a brand new WD mybook (just in degraded mode as there was no drive B).  Turned it off, put the drive from the server in as drive B, then turned it back on.  Voila, the system came back, saw the fresh drive B with no partition table, and rebuilt the RAID.  Now just waiting for rsync to finish copying the back up from the USB drive back to the RAID.

The moral of this story: storage sucks and I want the holocubes IBM promised us 15 years ago!