{"id":286,"date":"2014-06-23T18:43:31","date_gmt":"2014-06-23T22:43:31","guid":{"rendered":"http:\/\/brian.digitalmaddox.com\/blog\/?p=286"},"modified":"2014-07-08T16:45:18","modified_gmt":"2014-07-08T20:45:18","slug":"using-free-geospatial-tools-and-data-part-12-openstreetmap","status":"publish","type":"post","link":"https:\/\/brian.digitalmaddox.com\/blog\/?p=286","title":{"rendered":"Using Free Geospatial Tools and Data Part 12: OpenStreetMap"},"content":{"rendered":"<p>For this installment, we will look at importing data from <a title=\"OpenStreetMap.org\" href=\"http:\/\/www.openstreetmap.org\/\" target=\"_blank\">OpenStreetMap.org<\/a>. \u00a0As I mentioned in an earlier post, OpenStreetMap is a cloud-sourced GIS dataset with the goal of producing a global dataset that anyone can use. \u00a0There are two ways to download this data: you can either use Bittorrent and download the entire planet from\u00a0<a title=\"http:\/\/osm-torrent.torres.voyager.hr\/\" href=\"http:\/\/osm-torrent.torres.voyager.hr\/\" target=\"_blank\">http:\/\/osm-torrent.torres.voyager.hr\/<\/a> or download extracts from\u00a0<a title=\"http:\/\/download.geofabrik.de\/\" href=\"http:\/\/download.geofabrik.de\/\" target=\"_blank\">http:\/\/download.geofabrik.de\/<\/a>. \u00a0If you do not need the entire planet, I would highly recommend using geofabrik. \u00a0It has a fast downlink and they have finally added MD5 checksums so you can verify the integrity of your download.<\/p>\n<p>Go to\u00a0http:\/\/download.geofabrik.de\/ and click on North America. \u00a0We will be using the .pbf format file so click the link near the top of the page named\u00a0<em>north-america-latest.osm.pbf<\/em>. \u00a0It is about six gigabytes in size and the MD5sum is listed at the end of the paragraph. \u00a0Once the download is done in your browser, you can use the md5sum command under a Linux shell or download one of the many MD5sum clients for windows. \u00a0It will look similar to the below example output (it likely will not match exactly as the MD5 value will change as the data is modified.<\/p>\n<pre>bmaddox@girls:~\/Downloads\/geodata$ <strong>md5sum north-america-latest.osm.pbf<\/strong> \r\nd2daa9c7d3ef4dead4a2b5f790523e6d north-america-latest.osm.pbf\r\nbmaddox@girls:~\/Downloads\/geodata$<\/pre>\n<p>Next go back to the main geofabrik site and then click on and download the Central America file. \u00a0This will give you Mexico and the other Central American files. \u00a0As listed above, once the download is done in your browser, check it with md5sum. \u00a0If the values do not match, you will want to redownload and rerun md5sum again until they do.<\/p>\n<p>There are several programs you can use to import OpenStreetMap data into PostGIS. \u00a0They mainly differ on what schema they use and how they manipulate the data before it goes in. \u00a0For purposes of this post, we will be using the imposm program found at\u00a0<a title=\"http:\/\/imposm.org\/docs\/imposm\/latest\/\" href=\"http:\/\/imposm.org\/docs\/imposm\/latest\/\" target=\"_blank\">http:\/\/imposm.org\/docs\/imposm\/latest\/<\/a>. \u00a0If you are on Ubuntu, it should be a simple apt-get install imposm away. \u00a0For Windows or other distributions, you can download it directly from the imposm website. \u00a0The tutorial on how to import data using imposm can be found here:\u00a0<a title=\"http:\/\/imposm.org\/docs\/imposm\/latest\/tutorial.html\" href=\"http:\/\/imposm.org\/docs\/imposm\/latest\/tutorial.html\" target=\"_blank\">http:\/\/imposm.org\/docs\/imposm\/latest\/tutorial.html<\/a>.<\/p>\n<p>Using imposm is a multi-stage process. \u00a0The first stage is to have it read the data and combine the files into several intermediary files. \u00a0First create a PostGIS database by running:<\/p>\n<pre><strong>createdb -T gistemplate OSM<\/strong><\/pre>\n<p>Now have imposm take the data and convert it into its intermediary files. \u00a0To do this, run a similar command to this:<\/p>\n<pre>bmaddox@girls:\/data\/data\/geo$ <strong>imposm --read --concurrency 2 --proj EPSG:4326 ~\/Downloads\/geodata\/*.pbf<\/strong>\r\n[16:29:15] ## reading \/home\/bmaddox\/Downloads\/geodata\/central-america-latest.osm.pbf\r\n[16:29:15] coords: 500489k nodes: 10009k ways: 71498k relations: 500k (estimated)\r\n[16:31:27] coords: 21524k nodes: 92k ways: 2464k relations: 5k\r\n[16:31:28] ## reading \/home\/bmaddox\/Downloads\/geodata\/north-america-latest.osm.pbf\r\n[16:31:28] coords: 500489k nodes: 10009k ways: 71498k relations: 500k (estimated)\r\n[17:40:22] coords: 678992k nodes: 1347k ways: 44469k relations: 229k\r\n[17:40:23] reading took 1 h 11m 7 s\r\n[17:40:23] imposm took 1 h 11m 7 s\r\nbmaddox@girls:\/data\/data\/geo$<\/pre>\n<p>Here, I changed to a different drive and can the imposm command to read from the drive where I downloaded the .pbf files. \u00a0I did this since reading is a disk intensive process and spitting it between drives helps to speed things up a bit. \u00a0Also, I differed from the tutorial as my install of QGIS could not render OpenStreetMap data in its native EPSG:900913 projection with data in the EPSG:4326 coordinate system that my Tiger data was in. \u00a0Unless you have an extremely high-end workstation, this will take a while. \u00a0Once the process is done, you will have the following files in the output directory:<\/p>\n<pre>bmaddox@girls:~\/Downloads\/geodata\/foo$ <strong>dir<\/strong>\r\nimposm_coords.cache imposm_nodes.cache imposm_relations.cache imposm_ways.cache<\/pre>\n<p>The next step is to take the intermediary files and write them into PostGIS. \u00a0Here you can use a wild card to read all of the .pbf files you downloaded.<\/p>\n<pre>bmaddox@girls:~\/Downloads\/geodata\/foo$ <strong>imposm --write --database OSM --host localhost --user bmaddox --port 5432 --proj EPSG:4326\r\n<\/strong>password for bmaddox at localhost:\r\n[18:20:21] ## dropping\/creating tables\r\n[18:20:22] ## writing data\r\n[2014-06-15 18:52:46,074] imposm.multipolygon - WARNING - building relation 1834172 with 8971 ways (10854.8ms) and 8843 rings (2293.0ms) took 426854.5ms\r\n[2014-06-15 19:00:47,635] imposm.multipolygon - WARNING - building relation 2566179 with 4026 ways (4717.3ms) and 3828 rings (1115.6ms) took 89522.6ms\r\n[19:15:20] relations: 244k\/244k\r\n[19:15:41] relations: total time 55m 18s for 244095 (73\/s)\r\n[00:35:28] ways: 46907k\/46907k\r\n[00:35:30] ways: total time 5 h 19m 49s for 46907462 (2444\/s)\r\n[00:40:21] nodes: 1437k\/1437k\r\n[00:40:22] nodes: total time 4 m 51s for 1437951 (4933\/s)\r\n[00:40:22] ## creating generalized tables\r\n[01:44:47] generalizing tables took 1 h 4 m 24s\r\n[01:44:47] ## creating union views\r\n[01:44:48] creating views took 0 s\r\n[01:44:48] ## creating geometry indexes\r\n[02:15:02] creating indexes took 30m 14s\r\n[02:15:02] writing took 7 h 54m 41s\r\n[02:15:02] imposm took 7 h 54m 42s\r\nbmaddox@girls:~\/Downloads\/geodata\/foo$<\/pre>\n<p>As you can see from the above output, this took almost eight hours on my home server (quad core AMD with eight gig of RAM). \u00a0This command loads all of the data from the intermediate files into PostGIS. \u00a0However, we are not done yet. \u00a0Looking at the output, all it did was load the data and create indices. \u00a0It did not cluster the data or perform any other optimizations. \u00a0To do this, run the following imposm command:<\/p>\n<pre>bmaddox@girls:~\/Downloads\/geodata\/foo$ <strong>imposm --optimize -d OSM --user bmaddox<\/strong>\r\npassword for bmaddox at localhost:\r\n[17:18:12] ## optimizing tables\r\nClustering table osm_new_transport_areas\r\nClustering table osm_new_mainroads\r\nClustering table osm_new_buildings\r\nClustering table osm_new_mainroads_gen1\r\nClustering table osm_new_mainroads_gen0\r\nClustering table osm_new_amenities\r\nClustering table osm_new_waterareas_gen1\r\nClustering table osm_new_waterareas_gen0\r\nClustering table osm_new_motorways_gen0\r\nClustering table osm_new_aeroways\r\nClustering table osm_new_motorways\r\nClustering table osm_new_transport_points\r\nClustering table osm_new_railways_gen0\r\nClustering table osm_new_railways_gen1\r\nClustering table osm_new_landusages\r\nClustering table osm_new_waterways\r\nClustering table osm_new_railways\r\nClustering table osm_new_motorways_gen1\r\nClustering table osm_new_waterareas\r\nClustering table osm_new_places\r\nClustering table osm_new_admin\r\nClustering table osm_new_minorroads\r\nClustering table osm_new_landusages_gen1\r\nClustering table osm_new_landusages_gen0\r\nVacuum analyze\r\n[19:24:38] optimizing took 2 h 6 m 25s\r\n[19:24:38] imposm took 2 h 6 m 26s\r\nbmaddox@girls:~\/Downloads\/geodata\/foo$<\/pre>\n<p>On my system it took a couple of hours and clustered all of the tables and then did a vacuum analyze to update the database statistics.<\/p>\n<p>The final step is to have imposm rename the tables to what they will be in &#8220;production mode&#8221;. \u00a0Run the following:<\/p>\n<pre>bmaddox@girls:~\/Downloads\/geodata\/foo$ <strong>imposm -d OSM --user bmaddox --deploy-production-tables<\/strong>\r\npassword for bmaddox at localhost:\r\n[11:00:06] imposm took 1 s\r\nbmaddox@girls:~\/Downloads\/geodata\/foo$<\/pre>\n<p>Your data should now be optimized and ready for use. \u00a0To test it, refer to an earlier post in this series where I discussed using QGIS and load some of the OSM data into it.<\/p>\n<p>Your OSM database will have the following tables in it:<\/p>\n<pre> List of relations\r\n Schema | Name | Type | Owner \r\n--------+----------------------+-------+---------\r\n public | osm_admin | table | bmaddox\r\n public | osm_aeroways | table | bmaddox\r\n public | osm_amenities | table | bmaddox\r\n public | osm_buildings | table | bmaddox\r\n public | osm_landusages | table | bmaddox\r\n public | osm_landusages_gen0 | table | bmaddox\r\n public | osm_landusages_gen1 | table | bmaddox\r\n public | osm_mainroads | table | bmaddox\r\n public | osm_mainroads_gen0 | table | bmaddox\r\n public | osm_mainroads_gen1 | table | bmaddox\r\n public | osm_minorroads | table | bmaddox\r\n public | osm_motorways | table | bmaddox\r\n public | osm_motorways_gen0 | table | bmaddox\r\n public | osm_motorways_gen1 | table | bmaddox\r\n public | osm_places | table | bmaddox\r\n public | osm_railways | table | bmaddox\r\n public | osm_railways_gen0 | table | bmaddox\r\n public | osm_railways_gen1 | table | bmaddox\r\n public | osm_transport_areas | table | bmaddox\r\n public | osm_transport_points | table | bmaddox\r\n public | osm_waterareas | table | bmaddox\r\n public | osm_waterareas_gen0 | table | bmaddox\r\n public | osm_waterareas_gen1 | table | bmaddox\r\n public | osm_waterways | table | bmaddox\r\n public | spatial_ref_sys | table | bmaddox\r\n(25 rows)<\/pre>\n<p>The _gen0 and _gen1 tables are generalized and not as highly detailed as the other tables. \u00a0They are good for viewing data over large geographic areas (think nation scale). \u00a0With areas that large, it would take a lot of time to render the high resolution data. \u00a0Thus the _gen0 and _gen1 tables are simplified versions of the data for use at these resolutions. \u00a0You can use QGIS&#8217;s scale-dependent rendering to specify these tables and then go to the high-resolution tables upon zooming in.<\/p>\n<p>Go forth and play with the additional free geospatial data you now have in your database \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For this installment, we will look at importing data from OpenStreetMap.org. \u00a0As I mentioned in an earlier post, OpenStreetMap is a cloud-sourced GIS dataset with the goal of producing a global dataset that anyone can use. \u00a0There are two ways &hellip; <a href=\"https:\/\/brian.digitalmaddox.com\/blog\/?p=286\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-286","post","type-post","status-publish","format-standard","hentry","category-gis"],"_links":{"self":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/286","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=286"}],"version-history":[{"count":2,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/286\/revisions"}],"predecessor-version":[{"id":288,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/286\/revisions\/288"}],"wp:attachment":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=286"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=286"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=286"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}