No, Using Interfaces (or Abstractions) Alone Does NOT Mean You’re “Object Oriented”

Since I’ve been dealing with a lot of Java and now C# code over the past few years, I’ve noticed one thing: Java and C# programmers love interface classes. In fact, it seems that most Java and C# programmers think that they cannot have a concrete class that does not inherit from some interface.  I was curious about this in a lot of the C# code I’ve had to deal with so I asked why.  The answer I got was “that way we are using abstractions and encapsulating things.”

Wrong.  Just, wrong.

“Why not smart guy?” you might ask.  First off, let’s look at some definitions.  An interface is used to define a set of functionality that derived classes must implement.  Interfaces can only contain method and constant declarations, not definitions.  An abstraction reduces and factors out details so that the developer can focus on only a few concepts at a time.  It is similar to an interface, but instead of only containing declarations, it can contain partial definitions while forcing derived classes to re-implement certain functionality.

With these definitions, we see that an interface is just a language construct.  It really just specifies a required syntax.  In some languages, they are not even classes.  What went wrong?  Well, historically, it appears that people got the wrong idea that an interface splits contracts from implementations, which is a good thing in object-oriented programming because it encapsulates functionality.  An implementation does not do this, IT CAN’T.  Remember, an interface simply specifies what functions must be present and what their returns are.  It does not enforce how computations should be done.  Consider the following interface pseudo code that defines an imaginary List with a count variable that specifies how many elements are in the list:

interface MyList
  public void AddItem(T item);
  public int GetNumItems();

So, where does the above enforce a contract that each added item will increment an internal counter?  How does it FORCE me as a programmer to increment an internal counter?  It doesn’t; it can’t.  Since an interface is purely an empty shell, I as a user am free to do as I like as long as I just follow the interface definition.  If I don’t want to increment an internal counter, I don’t have to do so.  This does not really fulfill the object oriented dependency inversion principle (DIP), which states (as quoted by Wikipedia):

    A. High-level modules should not depend on low-level modules. Both should depend on abstractions.
    B. Abstractions should not depend on details. Details should depend on abstractions

In common speak, this basically means that we can focus on high-level design and issues by ignoring the low-level details.  We use abstractions to encapsulate functionality so that we are guaranteed that the low-level details are taken care of for us.  Consider the following pseudo code abstract List class:

abstract class MyList
  public void AddItem(T item)

  public int GetNumItems()
    return this.internalcounter;

  abstract private void AbstractedAdd(T item);

With the abstract class, we actually have a contract now that fulfills the DIP.  As an abstraction can contain a partial definition, we have a defined AddItem() function that calls an abstract internal function but also increments the internal counter.  While it is a loose guarantee, we are guaranteed that the internal counter is incremented every time AddItem() is called.  We now do not have to worry that the abstraction will take care of the item counter for us.

What appears to have happened over the years is that student programmers heard about things like the DIP and warped it to think it means that every class must have an interface (when they mean abstraction), whether or not the class is designed to be used only once.  This I think can be attributed to teachers not doing a good job at differentiating interfaces from abstractions and not really teaching what encapsulation means.  Thinking like this also led to the second problem.

Secondly, a lot of people did not get the message that “all software should be designed to be reusable” got discredited after the 1990’s when it turned out that this philosophy needlessly complicates code.  Trying to design code like this ends up with a huge Frankenstein’s monster that is hopelessly complex, prone to errors, and really does not face reality that being a Jack of all trades means you’re a master of none.  This created a somewhat tongue-in-cheek object oriented principle called the Reused Abstraction Principle (RAP) that says “having only one implementation of a given interface is code smell.”   We refactor code to pull out duplicate functionality because it helps to keep the code base small.  It improves reliability because having a single implementation of potentially duplicated code means we don’t have several duplicate implementations that may differ in how they are done.

However, this does not mean that code HAS to have duplicated functionality “just because.”  If your problem domain only has one instance of a use case, it really is OK to just have a single concrete class that implements this.  Focus on a good design that encapsulates the functionality of your problem domain, not worrying that every piece of functionality must be reusable.  Later on, if you problem domain is expanded and you end up with duplicate functionality, refactor it and then have an interface or abstract class.  Needless use of interfaces and abstractions just doubles the number of classes in your code base, and in most languages abstractions will have a performance penalty due to issues like virtual table lookups.  Simple use of interfaces and abstractions does not make you a cool kid rock-star disciple of the Gang of Four.

The Day After

Mandy was in a car accident yesterday morning on her way home from work.  She’s ok, just got a bump on her head.  The car, however, didn’t fare so well, but hopefully with the insurance company it will be fixed and as good as new soon.





Updating the Merged TIGER Files to the 2014 Dataset

Hey all, I am finally in the process of updating my merged state- and national-level TIGER files to the 2014 data that they have put out.  You can find them at my GIS Data Page.  Note that Roads are not uploaded yet but I already updated the links on the download page so you will get 404 errors until I get them uploaded.  I cannot promise it will be tonight since I have to sleep sometime 😉  If you find any 404s on the others, let me know in case I missed a link.

As usual, these are my own value added files that I am publishing in case some people find them useful.  If you use these and your business fails, your wife leaves you, your dog dies, and you write a country music song about it, not my fault.

More Fun with Old Maps

I’ll admit it, I really like old maps.  I especially like old topographic maps.  I think it started when I used to work for the US Geological Survey.  To me, it’s interesting to see how things change over time.  From my old urban grown prediction days, I like to see where and when populations change.

Since the USGS put out their Historical Topographic Map Collection (HTMC), I’ve been playing with the maps off and on for about a year now.  I finally decided to start merging available maps of the same scale and vintage to study and possibly do feature extraction down the road.  I’ll be placing them for download here in case anyone is interested as I process them.

I thought I’d share how I put them together in case anyone else is interested.  The first step is to go to the USGS website and pick the files you want to use.  The files there are available in GeoPDF format.  First thing you need to understand is that you may not find a map covering your area of interest at your scale of interest and vintage.  Not everything managed to survive to the current day.  For example, I made a merged 125K map of Virginia and most of southern VA is missing at that resolution.

Once I download the GeoPDFs I want to work on, I use a modified version of the script from the Xastir project.  The link to my modifications can be found here.  I use LZW compression for my GeoTIFFs as it’s lossless and keeps the quality from the GeoPDFs.  It is a Perl script and requires that you have GDAL and the GDAL API installed.  What it does is calculate the neat-lines of the GeoPDF and then clips it to the neat-line while converting it to a GeoTIFF.  Running it as as simple as: inputfile.pdf

Once you have all of your GeoPDF files download and converted, the next step is to merge them.  The fastest way I’ve found to merge involves using gdalbuildvrt and gdal_translate, also from GDAL.  The first step is to create a virtual dataset of all of your files by running something like:

gdalbuildvrt -resolution highest -a_srs EPSG:4326 merged.vrt parts/*.tif

The options I chose here are to pick the highest pixel resolution (-resolution) based on the input files.  For this case the resolutions should be the same, but this way I don’t have to go through and verify that.  Next I change the projection of the output file to WGS84 (-a_srs).  Next is the file name of the virtual dataset and then the input files.

Now that the virtual dataset is done, it’s time to actually merge all of the files together.  The virtual dataset contains the calculated bounding box that will contain all of the input files.  Now we use gdal_translate to actually create the merged GeoTIFF file:

gdal_translate -of GTiff -co COMPRESS=LZW -co PREDICTOR=2 merged.vrt ~/merged.tif

Here again I use LZW compression to losslessly compress the output data.  Note that gdal_translate will automatically add an Alpha channel as Band 4 in the image to denote areas that had no input data.  That’s why we do NOT add the -addalpha flag to gdalbuildvrt.  For performance tips, I’d suggest keeping the source data and output file on separate drives unless you’re running something like a solid state drive.  To give you an idea of the output file sizes, Virginia merged (which did have a lot of areas missing), was around 500 megabytes.

Next you’ll need a Shapefile to use as a cut file to clip the data.  Since I have the Census Tiger 2013 data in a local PostGIS database (see previous posts to this blog), I used QGIS to select just the VA state outline and then saved it as a Shapefile.

Finally, we will use gdalwarp to clip the merged GeoTIFF against the state outline to produce the clipped GeoTIFF that is just the state itself.  This operation can take a bit of time depending on how powerful a machine you’re running it on.  The command you will use is similar to this:

gdalwarp --config GDAL_CACHEMAX 1024 -wm 1024 -cutline va_outline.shp -crop_to_cutline -multi -t_srs EPSG:4326 -co COMPRESS=LZW -co PREDICTOR=2 -co BIGTIFF=YES -co TILED=YES ~/merged.tif clipped.tif

Some of the command line parameters I used are optional, I just tend to leave them in since I do a lot of copying and pasting 😉  First we tell GDAL to increase the size of its caches using the –config GDAL_CACHEMAX and -wm options.  Next we specify the file to clip against with the -cutline and -crop_to_cutline options.  The -multi option tells GDAL to process using multiple threads.  I again specify the output projection and the LZW compression parameters.  Here I also specify the BIGTIFF option just in case the output file goes over four gigabytes.  Finally, I tell gdalwarp to tile the output TIFF so it will load faster in a GIS by separating it into tiles.

The output will look something like the below figure.  I’ll start posting files as I get time.  Hope everyone is having a great holiday!

Clipped Virginia 125K Old Maps

Clipped Virginia 125K Old Maps

Keeping up with the Botnets Round 2

I’ve been keeping up with tracking how many botnets are out there scanning WordPress blogs.  I’ve eventually resorted to blocking huge chunks of the Internet via .htaccess files.   So far it’s been quite effective in limiting the number of botnet login attempts.

If anyone is interested, I’ve put the limit portion of my .htaccess file here.  Feel free to use it and customize for your needs.

Some Website Changes thanks to Botnets

Lately I’ve been getting tons and tons of login attempts from what appear to be botnets.  Since I’m getting tired of banning the IPs individually, I’m temporary taking to banning entire countries and ISP’s from hitting my blog.  If you’re in that group, sorry guys.  Take it up with your ISP.

Here are some stats I’ve been gathering.

IP Addresses Grouped by ISPs ab connect ltd. astral telecom sa bezeq international-ltd chandigarh china unicom zhejiang province network chinanet guangdong province network chl citycom networks pvt ltd codero comsenz technology ltd dedibox sas desenvolve solucoes de internet ltda desokey mohamed hassan centerarabs ecommerce corporation erfahren co. ltd. faan international fiberring b.v. forest eternal communication tech. fpt telecom company gigahost limited globotech communications go daddy netherlands b.v. llc llc llc llc llc llc llc llc h1 llc h1 llc hetzner online ag hetzner online ag ho chi minh city post and telecom company host papa inc. inc hostigation iomart hosting limited ip pool for ctrls ipx server gmbh itsoft ltd iweb dedicated cl leaseweb germany gmbh limestone networks inc linode linode llc lubos hutar mainstream consulting group inc media temple inc media temple inc media temple inc. media temple inc. memset ltd nap de las americas-madrid s.a. neocom-service isp new dream network llc nixway srl onesolutions online data services jsc online s.a.s. openminds bvba ovh hosting inc ovh sas ovh sas ovh sas ovh sas ovh sas ovh sas ovh sas ovh sas ovh sas privatesystems networks proserve b.v. pt. cyberindo aditama pt. des teknologi informasi radore veri merkezi hizmetleri a.s. rcs & rds business salay telekomunikasyon ticaret limited sirketi sc eurosistem srl sc webfactor srl serverbeach serverspace limited singlehop inc singlehop inc singlehop inc skyware sp. z o.o. soc. comercial wirenet chile ltda. softcom america inc. softcom america inc. softlayer technologies inc strato ag strato ag strato ag teknik data internet teknolojileri sti tolvu- og rafeindapjonusta sudurlands ehf turk telekomunikasyon anonim sirketi vietnam data communication company vietnam data communication company webfusion internet solutions webfusion internet solutions webfusion internet solutions xenosite b.v.

As you can see, I get a bunch from Godaddy and French ISP Ovh.  I’ve also banned Godaddy IP’s, Ovh, and Media Temple.  I’ll be adding others once I find all of their allocated net ranges.

For reference, here’s a copy of my current list along with attempts:

IPs Attempts 34 26 20 1 3 12 1 63 1 8 16 23 26 43 12 3 232 5 13 34 52 53 43 87 1 4 1 12 1 1 18 1 6 33 81 27 14 6 9 28 112 18 3 180 6 46 61 7 19 13 2 32 24 48 13 83 244 39 19 1 3 1 4 8 46 58 1 4 27 175 1 1 58 2 20 36
Total Attempts: 2469

Guess I should be flattered that I’m getting all of this “attention” 🙂

Using Free Geospatial Tools and Data Part 12: OpenStreetMap

For this installment, we will look at importing data from  As I mentioned in an earlier post, OpenStreetMap is a cloud-sourced GIS dataset with the goal of producing a global dataset that anyone can use.  There are two ways to download this data: you can either use Bittorrent and download the entire planet from or download extracts from  If you do not need the entire planet, I would highly recommend using geofabrik.  It has a fast downlink and they have finally added MD5 checksums so you can verify the integrity of your download.

Go to and click on North America.  We will be using the .pbf format file so click the link near the top of the page named north-america-latest.osm.pbf.  It is about six gigabytes in size and the MD5sum is listed at the end of the paragraph.  Once the download is done in your browser, you can use the md5sum command under a Linux shell or download one of the many MD5sum clients for windows.  It will look similar to the below example output (it likely will not match exactly as the MD5 value will change as the data is modified.

bmaddox@girls:~/Downloads/geodata$ md5sum north-america-latest.osm.pbf 
d2daa9c7d3ef4dead4a2b5f790523e6d north-america-latest.osm.pbf

Next go back to the main geofabrik site and then click on and download the Central America file.  This will give you Mexico and the other Central American files.  As listed above, once the download is done in your browser, check it with md5sum.  If the values do not match, you will want to redownload and rerun md5sum again until they do.

There are several programs you can use to import OpenStreetMap data into PostGIS.  They mainly differ on what schema they use and how they manipulate the data before it goes in.  For purposes of this post, we will be using the imposm program found at  If you are on Ubuntu, it should be a simple apt-get install imposm away.  For Windows or other distributions, you can download it directly from the imposm website.  The tutorial on how to import data using imposm can be found here:

Using imposm is a multi-stage process.  The first stage is to have it read the data and combine the files into several intermediary files.  First create a PostGIS database by running:

createdb -T gistemplate OSM

Now have imposm take the data and convert it into its intermediary files.  To do this, run a similar command to this:

bmaddox@girls:/data/data/geo$ imposm --read --concurrency 2 --proj EPSG:4326 ~/Downloads/geodata/*.pbf
[16:29:15] ## reading /home/bmaddox/Downloads/geodata/central-america-latest.osm.pbf
[16:29:15] coords: 500489k nodes: 10009k ways: 71498k relations: 500k (estimated)
[16:31:27] coords: 21524k nodes: 92k ways: 2464k relations: 5k
[16:31:28] ## reading /home/bmaddox/Downloads/geodata/north-america-latest.osm.pbf
[16:31:28] coords: 500489k nodes: 10009k ways: 71498k relations: 500k (estimated)
[17:40:22] coords: 678992k nodes: 1347k ways: 44469k relations: 229k
[17:40:23] reading took 1 h 11m 7 s
[17:40:23] imposm took 1 h 11m 7 s

Here, I changed to a different drive and can the imposm command to read from the drive where I downloaded the .pbf files.  I did this since reading is a disk intensive process and spitting it between drives helps to speed things up a bit.  Also, I differed from the tutorial as my install of QGIS could not render OpenStreetMap data in its native EPSG:900913 projection with data in the EPSG:4326 coordinate system that my Tiger data was in.  Unless you have an extremely high-end workstation, this will take a while.  Once the process is done, you will have the following files in the output directory:

bmaddox@girls:~/Downloads/geodata/foo$ dir
imposm_coords.cache imposm_nodes.cache imposm_relations.cache imposm_ways.cache

The next step is to take the intermediary files and write them into PostGIS.  Here you can use a wild card to read all of the .pbf files you downloaded.

bmaddox@girls:~/Downloads/geodata/foo$ imposm --write --database OSM --host localhost --user bmaddox --port 5432 --proj EPSG:4326
password for bmaddox at localhost:
[18:20:21] ## dropping/creating tables
[18:20:22] ## writing data
[2014-06-15 18:52:46,074] imposm.multipolygon - WARNING - building relation 1834172 with 8971 ways (10854.8ms) and 8843 rings (2293.0ms) took 426854.5ms
[2014-06-15 19:00:47,635] imposm.multipolygon - WARNING - building relation 2566179 with 4026 ways (4717.3ms) and 3828 rings (1115.6ms) took 89522.6ms
[19:15:20] relations: 244k/244k
[19:15:41] relations: total time 55m 18s for 244095 (73/s)
[00:35:28] ways: 46907k/46907k
[00:35:30] ways: total time 5 h 19m 49s for 46907462 (2444/s)
[00:40:21] nodes: 1437k/1437k
[00:40:22] nodes: total time 4 m 51s for 1437951 (4933/s)
[00:40:22] ## creating generalized tables
[01:44:47] generalizing tables took 1 h 4 m 24s
[01:44:47] ## creating union views
[01:44:48] creating views took 0 s
[01:44:48] ## creating geometry indexes
[02:15:02] creating indexes took 30m 14s
[02:15:02] writing took 7 h 54m 41s
[02:15:02] imposm took 7 h 54m 42s

As you can see from the above output, this took almost eight hours on my home server (quad core AMD with eight gig of RAM).  This command loads all of the data from the intermediate files into PostGIS.  However, we are not done yet.  Looking at the output, all it did was load the data and create indices.  It did not cluster the data or perform any other optimizations.  To do this, run the following imposm command:

bmaddox@girls:~/Downloads/geodata/foo$ imposm --optimize -d OSM --user bmaddox
password for bmaddox at localhost:
[17:18:12] ## optimizing tables
Clustering table osm_new_transport_areas
Clustering table osm_new_mainroads
Clustering table osm_new_buildings
Clustering table osm_new_mainroads_gen1
Clustering table osm_new_mainroads_gen0
Clustering table osm_new_amenities
Clustering table osm_new_waterareas_gen1
Clustering table osm_new_waterareas_gen0
Clustering table osm_new_motorways_gen0
Clustering table osm_new_aeroways
Clustering table osm_new_motorways
Clustering table osm_new_transport_points
Clustering table osm_new_railways_gen0
Clustering table osm_new_railways_gen1
Clustering table osm_new_landusages
Clustering table osm_new_waterways
Clustering table osm_new_railways
Clustering table osm_new_motorways_gen1
Clustering table osm_new_waterareas
Clustering table osm_new_places
Clustering table osm_new_admin
Clustering table osm_new_minorroads
Clustering table osm_new_landusages_gen1
Clustering table osm_new_landusages_gen0
Vacuum analyze
[19:24:38] optimizing took 2 h 6 m 25s
[19:24:38] imposm took 2 h 6 m 26s

On my system it took a couple of hours and clustered all of the tables and then did a vacuum analyze to update the database statistics.

The final step is to have imposm rename the tables to what they will be in “production mode”.  Run the following:

bmaddox@girls:~/Downloads/geodata/foo$ imposm -d OSM --user bmaddox --deploy-production-tables
password for bmaddox at localhost:
[11:00:06] imposm took 1 s

Your data should now be optimized and ready for use.  To test it, refer to an earlier post in this series where I discussed using QGIS and load some of the OSM data into it.

Your OSM database will have the following tables in it:

 List of relations
 Schema | Name | Type | Owner 
 public | osm_admin | table | bmaddox
 public | osm_aeroways | table | bmaddox
 public | osm_amenities | table | bmaddox
 public | osm_buildings | table | bmaddox
 public | osm_landusages | table | bmaddox
 public | osm_landusages_gen0 | table | bmaddox
 public | osm_landusages_gen1 | table | bmaddox
 public | osm_mainroads | table | bmaddox
 public | osm_mainroads_gen0 | table | bmaddox
 public | osm_mainroads_gen1 | table | bmaddox
 public | osm_minorroads | table | bmaddox
 public | osm_motorways | table | bmaddox
 public | osm_motorways_gen0 | table | bmaddox
 public | osm_motorways_gen1 | table | bmaddox
 public | osm_places | table | bmaddox
 public | osm_railways | table | bmaddox
 public | osm_railways_gen0 | table | bmaddox
 public | osm_railways_gen1 | table | bmaddox
 public | osm_transport_areas | table | bmaddox
 public | osm_transport_points | table | bmaddox
 public | osm_waterareas | table | bmaddox
 public | osm_waterareas_gen0 | table | bmaddox
 public | osm_waterareas_gen1 | table | bmaddox
 public | osm_waterways | table | bmaddox
 public | spatial_ref_sys | table | bmaddox
(25 rows)

The _gen0 and _gen1 tables are generalized and not as highly detailed as the other tables.  They are good for viewing data over large geographic areas (think nation scale).  With areas that large, it would take a lot of time to render the high resolution data.  Thus the _gen0 and _gen1 tables are simplified versions of the data for use at these resolutions.  You can use QGIS’s scale-dependent rendering to specify these tables and then go to the high-resolution tables upon zooming in.

Go forth and play with the additional free geospatial data you now have in your database 🙂

Posted in GIS

The Lost Research Paper

Towards the end of my tenure at the US Geological Survey, I was the project manager and principal investigator of Restoration of Data from Lossy Compression.  The goal of the project was to find ways to restore fine detail that was lost during lossy compression processes such as JPEG.  I had submitted an Open File report through the review process, but left the USGS in 2006 before the paper had completed review.  As I had left, it basically fell through the cracks and was never officially published.

I had forgotten about it until recently when I was updating my resume.  So, without further ado, I have put the paper here.  I took out the USGS logo and what not since it was never officially published by them.  So for a flashback into what I was doing in 2006, have fun reading it 🙂

Using Free Geospatial Tools and Data Part 11: NGA Geonames

Updated 23 March 2018: Changed for new size necessary for the cc2 column

It’s been a while since I’ve made a post, so thought I’d keep going with the data series.  This time around I’ll be talking about how to make your own local copy of the NGA Geonames database.  This database is similar to GNIS, but covers the whole globe and also has information on location such as airfields, pipelines, and so on.

First, download the following files from the Geonames website:

  • admin1CodesASCII.txt
  • admin2Codes.txt
  • allCountries.txt
  • alternateNamesV2.txt
  • countryInfo.txt
  • featureCodes_en.txt
  • hierarchy.txt
  • iso-languagecodes.txt
  • timeZones.txt
  • userTags.txt

Some of them are zipped, so you’ll need to unzip them into the same directory as the others for ease of use.  Next, create your geonames database by running:

bmaddox@girls:~/Downloads/geodata$ createdb -T gistemplate Geonames

Next, we will create the table for the main points file, which is called allCountries.txt.  Run the following command from the same directory where you have all of the Geonames files:

bmaddox@girls:~/Downloads/geodata$ psql -d Geonames 
psql (9.3.4)
Type "help" for help.

This will put you into the PostgreSQL command line.  Now create the table to hold the data in the allCountries.txt file:

Geonames=# create table geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames text,
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(170),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
dem int,
timezone varchar(40),
moddate date

Now we will use a built-in PostgreSQL command to load data in the DB.  There are two forms of it, the long way specifies the column names in order on the command line, the other just the file name.  We will be using the short way here:

Geonames=# \copy geoname from allCountries.txt null as '';

This loads the data, but it is not yet ready to be usable by a GIS.  We will need to create a geometry column for the data and then use the latitude and longitude columns to create a point column in the geometry.

Geonames=# SELECT AddGeometryColumn( 'geoname', 'the_geom', 4326, 'POINT', 2);
 public.geoname.the_geom SRID:4326 TYPE:POINT DIMS:2 
(1 row)

This command creates the geometry column, and specifies an EPSG of 4326 (WGS84).  Now we need to insert the latitude and longitudes of the points into this column:

Geonames=# update geoname SET the_geom = ST_PointFromText('POINT(' || longitude || ' ' || latitude || ')', 4326);
UPDATE 8943136

This will take a while as PostGIS must read each point, convert it into the proper format, and then add it into the geometry column.  Now we need to add a geospatial index on this column to make the queries faster.  Again, it may take a while to run.

Geonames=# create index geoname_the_geom_gist_idx on geoname using gist (the_geom);

Once this is done, we should optimize this table as I mentioned in a previous post.  We need to analyze the database and then cluster it on the points.

Geonames=# vacuum analyze geoname;
Geonames=# cluster geoname using geoname_the_geom_gist_idx;
Geonames=# analyze geoname;

There are several auxiliary tables we should now add to the geonames database.  These define the values used in the various columns and can be used in a JOIN statement in a GIS.  I’m going to leave out the vacuum analyze steps but you should perform it on each table below.  The first will be the alternatename table, which holds data from the  alternateNames.txt file.  This file contains a list of other names some of the points are known by and is connected to the geoname table by the geonameId column:

Geonames=# create table alternatename (
alternatenameId int,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(400),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
Geonames=# \copy alternatename from alternateNames.txt null as '';

Next we move on to the iso-languagecodes.txt file.  This file contains ISO-638 standard names for all of the countries in the database.

Geonames=# create table "isolanguage" (
 iso_639_3 char(3),
 iso_639_2 char(10),
 iso_639_1 char(3),
 language_name varchar(100)
Geonames=# \copy isolanguage from iso-languagecodes.txt null '' delimiter E'\t' csv header

Next we will create and load the countryInfo.txt file, which contains information about each country such as iso codes, phone number formats, and so on.  First, we need to remove the comment lines from the start of the file to make things easier.  You can either do this with a text editor and delete every line that starts with the # character, or you can run the following command from bash:

bmaddox@girls:~/Downloads/geodata$ egrep -v "^[[:blank:]]*#" countryInfo.txt > countryInfo2.txt

With this done, we can proceed with the import as normal:

Geonames=# create table "countryinfo" ( 
 iso_alpha2 char(2),
 iso_alpha3 char(3),
 iso_numeric integer,
 fips_code varchar(3),
 name varchar(200),
 capital varchar(200),
 areainsqkm double precision,
 population integer,
 continent varchar(2),
 tld varchar(10),
 currencycode varchar(3),
 currencyname varchar(20),
 phone varchar(20),
 postalcode varchar(100),
 postalcoderegex varchar(200),
 languages varchar(200),
 geonameId int,
 neighbors varchar(50),
 equivfipscode varchar(3)
Geonames=# \copy countryinfo from countryInfo2.txt null as '';

Next we do the timeZones.txt file:

Geonames=# create table "timezones" (
countrycode char(2),
TimeZoneId varchar(30),
gmtoffset double precision,
dstoffset double precision,
rawoffset double precision
Geonames=# \copy timezones from timeZones.txt null '' delimiter E'\t' csv header

Next we do the admin1CodesASCII.txt table, which matches ascii names of administrative divisions to their codes:

Geonames=# CREATE TABLE "admin1codesascii" ( 
code CHAR(10), 
name TEXT, 
nameAscii TEXT, 
geonameid int 
Geonames=# \copy admin1codesascii from admin1CodesASCII.txt null as '';

Now we do the admin2Codes.txt file that maps the admin2code values to their textual entries.

Geonames=# CREATE TABLE "admin2codes" (
 code varchar(30),
 name_local text,
 name text,
 geonameid int
Geonames=# \copy admin2codes from admin2Codes.txt null as '';

Next is featureCodes_en.txt, which maps feature codes to their descriptions:

Geonames=# CREATE TABLE "featurecodes" ( 
code CHAR(7), 
name VARCHAR(200), 
description TEXT 
Geonames=# \copy featurecodes from featureCodes_en.txt null as '';

Next is the userTags.txt file that contains user-contributed tagging to the points.

Geonames=# create table "usertags" (
geonameid int,
tag varchar(40)
Geonames=# \copy usertags from userTags.txt null as '';

Finally we will handle the hierarchy.txt file, which contains parent-child relationships modeled from the admin1-4 codes.

Geonames=# create table "hierarchy" (
parentId int,
childId int,
type varchar(40)
Geonames=# \copy hierarchy from hierarchy.txt null as '';

You now should have your own complete copy of the Geonames database.  They do publish updates regularly, so you can either recreate the tables or enter in their changes files.  You may also wish to index the type column of allcountries so you can create custom views that only display things like airports, towers, and so on.

Posted in GIS