{"id":221,"date":"2014-03-15T10:47:54","date_gmt":"2014-03-15T14:47:54","guid":{"rendered":"http:\/\/brian.digitalmaddox.com\/blog\/?p=221"},"modified":"2014-07-08T16:45:19","modified_gmt":"2014-07-08T20:45:19","slug":"using-free-geospatial-tools-and-data-part-9-maintaining-your-geospatial-database","status":"publish","type":"post","link":"https:\/\/brian.digitalmaddox.com\/blog\/?p=221","title":{"rendered":"Using Free Geospatial Tools and Data Part 9: Maintaining your Geospatial Database"},"content":{"rendered":"<p>Now that you have a lot of data into your geospatial database, we should take a little bit of time to discuss how to manage it and keep it running smoothly. \u00a0This can make the difference between requests timing out and having data be returned almost instantaneously.<\/p>\n<h4>Database Tuning<\/h4>\n<p><span style=\"line-height: 1.5\">There are a large number of sources online that will go over how to configure PostgreSQL for maximum performance. \u00a0The PostgreSQL team themselves provide such documentation at the <\/span><a style=\"line-height: 1.5\" title=\"Tuning Your PostgreSQL Server wiki entry\" href=\"https:\/\/wiki.postgresql.org\/wiki\/Tuning_Your_PostgreSQL_Server\" target=\"_blank\">Tuning Your PostgreSQL Server<\/a><span style=\"line-height: 1.5\">\u00a0wiki page and a list of many techniques at their <\/span><a style=\"line-height: 1.5\" title=\"Performance Optimization wiki page\" href=\"http:\/\/wiki.postgresql.org\/wiki\/Performance_Optimization\" target=\"_blank\">Performance Optimization<\/a><span style=\"line-height: 1.5\"> wiki page.<\/span><\/p>\n<p>For the PostGIS side of things, Boundless has a workshop page titled <a title=\"Tuning Postgres for Spatial\" href=\"http:\/\/workshops.boundlessgeo.com\/postgis-intro\/tuning.html\" target=\"_blank\">Tuning Postgres for Spatial<\/a> that provides some information on configuring for spatial operations. \u00a0The PostGIS team also has some tips that can be found at <a title=\"Chapter 6: Performance Tips\" href=\"http:\/\/postgis.net\/docs\/manual-1.5\/ch06.html\" target=\"_blank\">this<\/a> link.<\/p>\n<p>Another tool is the <a title=\"pgtune GitHub page\" href=\"https:\/\/github.com\/gregs1104\/pgtune\" target=\"_blank\">pgtune<\/a> utility. \u00a0This is a command line tool that lets you specify what you will be using your database for and what type of machine it is running on. \u00a0It then will give you several options that you can put in the PostgreSQL configuration files.<\/p>\n<h4>Table Maintenance<\/h4>\n<p>In general, the main things you should consider are creating indices, vacuuming your database to remove unused space, keeping your database statistics up to date, and clustering your database so that data that is spatially near each other are physically near each other on disk.<\/p>\n<p>To perform the following commands, open a command prompt and run the command:<\/p>\n<pre>psql -d Census_2013<\/pre>\n<p>You should then see output similar to the following:<\/p>\n<pre>[bmaddox@girls ~]$ <strong>psql -d Census_2013<\/strong> \r\npsql (9.2.7)\r\nType \"help\" for help.\r\n\r\nCensus_2013=#<\/pre>\n<p>If you followed the commands in this series, your database tables should already have a spatial index based on the geom column that lets PostGIS only return data that you have requested. \u00a0To see this, first run the command<\/p>\n<pre>\\dt<\/pre>\n<p>This will give you a list of all of the tables in your database. \u00a0On my system, I get the output below:<\/p>\n<pre>Census_2013=# <strong>\\dt<\/strong>\r\n List of relations\r\n Schema | Name | Type | Owner \r\n--------+-----------------------------------------+-------+---------\r\n public | county_outlines | table | bmaddox\r\n public | spatial_ref_sys | table | bmaddox\r\n public | state_outlines | table | bmaddox\r\n public | us_113_congress_districts | table | bmaddox\r\n public | us_alaska_native_regional_corporations | table | bmaddox\r\n public | us_area_landmarks | table | bmaddox\r\n public | us_area_water | table | bmaddox\r\n public | us_census_block_groups | table | bmaddox\r\n public | us_coastlines | table | bmaddox\r\n public | us_combined_new_england_city_town_areas | table | bmaddox\r\n public | us_combined_statistical_areas | table | bmaddox\r\n public | us_elementary_school_districts | table | bmaddox\r\n public | us_indian_alaska_hawaii_native_areas | table | bmaddox\r\n public | us_indian_tribal_subdivisions | table | bmaddox\r\n public | us_linear_water | table | bmaddox\r\n public | us_metro_micropolitan_statistical_areas | table | bmaddox\r\n public | us_military_areas | table | bmaddox\r\n public | us_new_england_city_town_areas | table | bmaddox\r\n public | us_new_england_city_town_divisions | table | bmaddox\r\n public | us_primary_roads | table | bmaddox\r\n public | us_primary_secondary_roads | table | bmaddox\r\n public | us_rails | table | bmaddox\r\n public | us_roads | table | bmaddox\r\n public | us_secondard_school_districts | table | bmaddox\r\n public | us_state_legislative_lower | table | bmaddox\r\n public | us_state_legislative_upper | table | bmaddox\r\n public | us_tribal_block_groups | table | bmaddox\r\n public | us_unified_school_districts | table | bmaddox\r\n public | us_urban_areas_2010 | table | bmaddox\r\n public | us_zip_code_areas | table | bmaddox\r\n(30 rows)\r\n\r\nCensus_2013=#<\/pre>\n<p>To get further details on a table, including the indices, run the following:<\/p>\n<pre>Census_2013=# <strong>\\d us_zip_code_areas<\/strong>\r\n Table \"public.us_zip_code_areas\"\r\n Column | Type | Modifiers \r\n\r\n------------+-----------------------------+---------------------------------------------------------\r\n--------\r\n gid | integer | not null default nextval('us_zip_code_areas_gid_seq'::re\r\ngclass)\r\n zcta5ce10 | character varying(5) | \r\n geoid10 | character varying(5) | \r\n classfp10 | character varying(2) | \r\n mtfcc10 | character varying(5) | \r\n funcstat10 | character varying(1) | \r\n aland10 | double precision | \r\n awater10 | double precision | \r\n intptlat10 | character varying(11) | \r\n intptlon10 | character varying(12) | \r\n geom | geometry(MultiPolygon,4269) | \r\nIndexes:\r\n \"us_zip_code_areas_pkey\" PRIMARY KEY, btree (gid)\r\n \"us_zip_code_areas_geom_gist\" gist (geom)\r\n\r\nCensus_2013=#<\/pre>\n<p>The geospatial index here is the <em>us_zip_code_areas_geom_gist\u00a0<\/em> while the standard primary key index is <em>us_zip_code_areas_pkey<\/em>.<\/p>\n<p>When you create a new index, or perform a large number of inserts, updates, or deletes, you generally need to update the database&#8217;s statistics on that table as well as clear out any unused space. \u00a0Pick one of your tables and run the following command,\u00a0noting that depending on the size of your database (especially with US_Roads, for example), this command can take a while to complete:<\/p>\n<pre>Census_2013=# <strong>vacuum analyze us_zip_code_areas;<\/strong>\r\nVACUUM\r\nCensus_2013=#<\/pre>\n<p>This command tells PostgreSQL to clean up any unused space and update statistics on the <em>us_zip_code_areas<\/em> table. \u00a0These statistics are used internally by the database when it runs user queries on the data.<\/p>\n<p>The next thing you should do is cluster the data. \u00a0Clustering is an operation that is performed against a database index and it physically places data that is close to each other in the index in the same area on the hard drive. \u00a0For geospatial data, this can make a huge difference, as the database does not have to scan multiple areas on the hard drive to return data that your GIS may request. \u00a0To cluster data on the geospatial index, run the following command next, again noting that this could take a while to finish depending on the size of your database and your hardware:<\/p>\n<pre>Census_2013=# <strong>cluster us_zip_code_areas using us_zip_code_areas_geom_gist ;<\/strong>\r\nCLUSTER\r\nCensus_2013=#<\/pre>\n<p>Once finished, run the table describe command (<em>\\d<\/em>) again:<\/p>\n<pre>Census_2013=# <strong>\\d us_zip_code_areas<\/strong>\r\n Table \"public.us_zip_code_areas\"\r\n Column | Type | Modifiers \r\n\r\n------------+-----------------------------+---------------------------------------------------------\r\n--------\r\n gid | integer | not null default nextval('us_zip_code_areas_gid_seq'::re\r\ngclass)\r\n zcta5ce10 | character varying(5) | \r\n geoid10 | character varying(5) | \r\n classfp10 | character varying(2) | \r\n mtfcc10 | character varying(5) | \r\n funcstat10 | character varying(1) | \r\n aland10 | double precision | \r\n awater10 | double precision | \r\n intptlat10 | character varying(11) | \r\n intptlon10 | character varying(12) | \r\n geom | geometry(MultiPolygon,4269) | \r\nIndexes:\r\n \"us_zip_code_areas_pkey\" PRIMARY KEY, btree (gid)\r\n \"us_zip_code_areas_geom_gist\" gist (geom) CLUSTER\r\n\r\nCensus_2013=#<\/pre>\n<p>As you can see, the geom index now has the world CLUSTER behind it, denoting that this is the index that you clustered on. \u00a0It will generally make more sense for geospatial databases to cluster on the geospatial index as that will ensure data that is near each other in the real world is near each other on disk.<\/p>\n<p>However, now that you have moved data around on disk, you again need to update the database statistics. \u00a0You will not need to <em>vacuum<\/em> it again, as the <em>cluster<\/em> command does this itself as it rearranges the data.<\/p>\n<pre>Census_2013=# <strong>analyze us_zip_code_areas ;<\/strong>\r\nANALYZE\r\nCensus_2013=#<\/pre>\n<p>So there you are. \u00a0Running these commands on your data tables, in addition to tuning your database itself, can make a huge performance difference over simply accepting the default options.<\/p>\n<p>Next time we will go over backing up your database and then head on to loading data again, such as the <a title=\"US Geological Survey\" href=\"http:\/\/www.usgs.gov\/\" target=\"_blank\">USGS<\/a> <a title=\"GNIS Information\" href=\"http:\/\/nhd.usgs.gov\/gnis.html\" target=\"_blank\">Geographic Names Information System (GNIS)<\/a>, the <a title=\"NGA Homepage\" href=\"https:\/\/www.nga.mil\/Pages\/default.aspx\" target=\"_blank\">National Geospatial-Intelligence Agency&#8217;s<\/a> <a title=\"Geonames\" href=\"http:\/\/www.geonames.org\/\" target=\"_blank\">Geonames<\/a>, and <a title=\"OpenStreetMap\" href=\"http:\/\/www.openstreetmap.org\/\" target=\"_blank\">OpenStreetMap<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now that you have a lot of data into your geospatial database, we should take a little bit of time to discuss how to manage it and keep it running smoothly. \u00a0This can make the difference between requests timing out &hellip; <a href=\"https:\/\/brian.digitalmaddox.com\/blog\/?p=221\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-221","post","type-post","status-publish","format-standard","hentry","category-gis"],"_links":{"self":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/221","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=221"}],"version-history":[{"count":4,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/221\/revisions"}],"predecessor-version":[{"id":225,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/221\/revisions\/225"}],"wp:attachment":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=221"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}