Using Free Geospatial Tools and Data Part 11: NGA Geonames

It’s been a while since I’ve made a post, so thought I’d keep going with the data series.  This time around I’ll be talking about how to make your own local copy of the NGA Geonames database.  This database is similar to GNIS, but covers the whole globe and also has information on location such as airfields, pipelines, and so on.

First, download the following files from the Geonames website:

 -rw-rw-r-- 1 bmaddox bmaddox 139526 May 23 21:00 admin1CodesASCII.txt
 -rw-rw-r-- 1 bmaddox bmaddox 1881189 May 23 21:00 admin2Codes.txt
 -rw-rw-r-- 1 bmaddox bmaddox 1113017609 May 24 02:50 allCountries.txt
 -rw-rw-r-- 1 bmaddox bmaddox 326799799 May 24 03:00 alternateNames.txt
 -rw-rw-r-- 1 bmaddox bmaddox 31382 May 23 20:44 countryInfo.txt
 -rw-rw-r-- 1 bmaddox bmaddox 57018 May 23 20:44 featureCodes_en.txt
 -rw-rw-r-- 1 bmaddox bmaddox 5849444 May 24 03:05 hierarchy.txt
 -rw-rw-r-- 1 bmaddox bmaddox 127202 May 23 20:44 iso-languagecodes.txt
 -rw-rw-r-- 1 bmaddox bmaddox 13947 May 23 21:05 timeZones.txt
 -rw-rw-r-- 1 bmaddox bmaddox 870561 May 24 03:05 userTags.txt

Some of them are zipped, so you’ll need to unzip them into the same directory as the others for ease of use.  Next, create your geonames database by running:

bmaddox@girls:~/Downloads/geodata$ createdb -T gistemplate Geonames

Next, we will create the table for the main points file, which is called allCountries.txt.  Run the following command from the same directory where you have all of the Geonames files:

bmaddox@girls:~/Downloads/geodata$ psql -d Geonames 
psql (9.3.4)
Type "help" for help.

This will put you into the PostgreSQL command line.  Now create the table to hold the data in the allCountries.txt file:

Geonames=# create table geoname (
geonameid int,
name varchar(200),
asciiname varchar(200),
alternatenames text,
latitude float,
longitude float,
fclass char(1),
fcode varchar(10),
country varchar(2),
cc2 varchar(90),
admin1 varchar(20),
admin2 varchar(80),
admin3 varchar(20),
admin4 varchar(20),
population bigint,
elevation int,
dem int,
timezone varchar(40),
moddate date

Now we will use a built-in PostgreSQL command to load data in the DB.  There are two forms of it, the long way specifies the column names in order on the command line, the other just the file name.  We will be using the short way here:

Geonames=# \copy geoname from allCountries.txt null as '';

This loads the data, but it is not yet ready to be usable by a GIS.  We will need to create a geometry column for the data and then use the latitude and longitude columns to create a point column in the geometry.

Geonames=# SELECT AddGeometryColumn( 'geoname', 'the_geom', 4326, 'POINT', 2);
 public.geoname.the_geom SRID:4326 TYPE:POINT DIMS:2 
(1 row)

This command creates the geometry column, and specifies an EPSG of 4326 (WGS84).  Now we need to insert the latitude and longitudes of the points into this column:

Geonames=# update geoname SET the_geom = ST_PointFromText('POINT(' || longitude || ' ' || latitude || ')', 4326);
UPDATE 8943136

This will take a while as PostGIS must read each point, convert it into the proper format, and then add it into the geometry column.  Now we need to add a geospatial index on this column to make the queries faster.  Again, it may take a while to run.

Geonames=# create index geoname_the_geom_gist_idx on geoname using gist (the_geom);

Once this is done, we should optimize this table as I mentioned in a previous post.  We need to analyze the database and then cluster it on the points.

Geonames=# vacuum analyze geoname ;
Geonames=# cluster geoname using geoname_the_geom_gist_idx ;
Geonames=# analyze geoname;

There are several auxiliary tables we should now add to the geonames database.  These define the values used in the various columns and can be used in a JOIN statement in a GIS.  I’m going to leave out the vacuum analyze steps but you should perform it on each table below.  The first will be the alternatename table, which holds data from the  alternateNames.txt file.  This file contains a list of other names some of the points are known by and is connected to the geoname table by the geonameId column:

Geonames=# create table alternatename (
alternatenameId int,
geonameid int,
isoLanguage varchar(7),
alternateName varchar(200),
isPreferredName boolean,
isShortName boolean,
isColloquial boolean,
isHistoric boolean
Geonames=# \copy alternatename from alternateNames.txt null as '';

Next we move on to the iso-languagecodes.txt file.  This file contains ISO-638 standard names for all of the countries in the database.

Geonames=# create table "isolanguage" (
 iso_639_3 char(3),
 iso_639_2 char(10),
 iso_639_1 char(3),
 language_name varchar(100)
Geonames=# \copy isolanguage from iso-languagecodes.txt null '' delimiter E'\t' csv header

Next we will create and load the countryInfo.txt file, which contains information about each country such as iso codes, phone number formats, and so on.  First, we need to remove the comment lines from the start of the file to make things easier.  You can either do this with a text editor and delete every line that starts with the # character, or you can run the following command from bash:

bmaddox@girls:~/Downloads/geodata$ egrep -v "^[[:blank:]]*#" countryInfo.txt > countryInfo2.txt

With this done, we can proceed with the import as normal:

Geonames=# create table "countryinfo" ( 
 iso_alpha2 char(2),
 iso_alpha3 char(3),
 iso_numeric integer,
 fips_code varchar(3),
 name varchar(200),
 capital varchar(200),
 areainsqkm double precision,
 population integer,
 continent varchar(2),
 tld varchar(10),
 currencycode varchar(3),
 currencyname varchar(20),
 phone varchar(20),
 postalcode varchar(100),
 postalcoderegex varchar(200),
 languages varchar(200),
 geonameId int,
 neighbors varchar(50),
 equivfipscode varchar(3)
Geonames=# \copy countryinfo from countryInfo2.txt null as '';

Next we do the timeZones.txt file:

Geonames=# create table "timezones" (
countrycode char(2),
TimeZoneId varchar(30),
gmtoffset double precision,
dstoffset double precision,
rawoffset double precision
Geonames=# \copy timezones from timeZones.txt null '' delimiter E'\t' csv header

Next we do the admin1CodesASCII.txt table, which matches ascii names of administrative divisions to their codes:

Geonames=# CREATE TABLE "admin1codesascii" ( 
code CHAR(10), 
name TEXT, 
nameAscii TEXT, 
geonameid int 
Geonames=# \copy admin1codesascii from admin1CodesASCII.txt null as '';

Now we do the admin2Codes.txt file that maps the admin2code values to their textual entries.

Geonames=# CREATE TABLE "admin2codes" (
 code varchar(30),
 name_local text,
 name text,
 geonameid int
Geonames=# \copy admin2codes from admin2Codes.txt null as '';

Next is featureCodes_en.txt, which maps feature codes to their descriptions:

Geonames=# CREATE TABLE "featurecodes" ( 
code CHAR(7), 
name VARCHAR(200), 
description TEXT 
Geonames=# \copy featurecodes from featureCodes_en.txt null as '';

Next is the userTags.txt file that contains user-contributed tagging to the points.

Geonames=# create table "usertags" (
geonameid int,
tag varchar(40)
Geonames=# \copy usertags from userTags.txt null as '';

Finally we will handle the hierarchy.txt file, which contains parent-child relationships modeled from the admin1-4 codes.

Geonames=# create table "hierarchy" (
parentId int,
childId int,
type varchar(40)
Geonames=# \copy hierarchy from hierarchy.txt null as '';

You now should have your own complete copy of the Geonames database.  They do publish updates regularly, so you can either recreate the tables or enter in their changes files.  You may also wish to index the type column of allcountries so you can create custom views that only display things like airports, towers, and so on.

Posted in GIS