We have finally gotten past the preliminaries, and this series now takes a turn towards what free geospatial data is available and how you can make use of them in free tools. The rest of this series will focus heavily on putting data into a geospatial database, PostGIS in this case. I will also be posting various bash scripts that I have written to make things easier when staging data for import. Many of the datasets are megabytes and gigabytes in size. Trying to use them as a series of files would be slow and very inefficient.
The whole reason people started putting data into geospatial databases is they wanted to use the ability to localize data and use relational query syntax to speed up fetching data. Geospatial databases can physically store data that is spatially near each other in the same locations on disk, commonly called clustering. Spatial indexes can be created that makes it easier for the database to locate information. Combine this with a query that only requests a subset of the data and suddenly you can manipulate large datasets with ease.
“Why would I want to make my own databases when I have Google?” you might ask yourself. You can only get so much out of Google Maps for free. If you make a lot of use of their servers, or if you use it for commercial purposes, you will have to pay. Map data you get back are pre-rendered raster tiles using their styling. Even OpenStreetMap serves up raster tiles already styled. You could use a lot of traffic going back and forth to their servers. If your bandwidth is metered, this could increase your out-of-pocket costs.
If you store the vector data yourself, you have access to the original data that was used to make up the raster tiles from Internet sources. You can do a lot with this type of information, from accurately measuring line segment distances to geolocating street addresses. More importantly, you have access to all the metadata that is in the vector data. You can get more than just a name about a point or area: you can find out who collected that point, when it was collected, and in some cases even who owns a point or area. You can style the data however you want. You can impress all your GIS geek friends. And best of all, you can use it as much as you want without having to pay anyone else.
This post is a bit of a foreshadowing of what is to come. I will provide pointers to the various datasets that will be covered. The idea is that you can read ahead and get a feel for what kinds of information each dataset contains. You might even be amazed at how much is out there.
The first dataset we will cover is the US Census Bureau’s Topologically Integrated Geographic Encoding and Referencing (TIGER) Dataset, which for US citizens is the granddaddy of them all. This dataset has been released since the 1990’s and is one of the original base datasets for OpenStreetMap in the USA. TIGER currently contains a large amount of metadata for each road segment such as left and right street address ranges, what zip codes they are in, and so on.
Next are the data from the OpenStreetMap project. This project was created in 2004 in the UK and has grown to include crowd-sourced data worldwide. It is maintained by volunteers and in many cases is more up to date than many traditional data source. Volunteers can contribute GPS traces or collect data off of aerial photography. In additional to roads, this dataset contains a huge amount of points of interest and trails.
The Natural Earth dataset is a public domain collection that is available at scales of 1:10m, 1:50m, and 1:110m. While not as comprehensive as OpenStreetMap, this dataset contains raster and vector data along with associated metadata. It also contains shaded relief maps that are combined with color-ramped elevation data derived from satellite imagery. It is maintained by the North American Cartographic Information Society (NACIS).
Combined, these three datasets will take up quite a bit of space on your system. You should expect to need at least 30 gigabytes of space on your system, more depending on if you import all of OpenStreetMap or not. We’ll cover that in later installments of this series.
In the meantime, follow the links I’ve posted, play around with tools such as QGIS, and get a feel for how things work. Hit up the project web pages to learn more about the software and Google for more background if you’re coming into the GIS world fresh.