Tutorial #28
Shapefiles, GeoJSON and KML Background   2013-12-02

Introduction

Custom maps on the web typically use a base map from Google or Bing Maps, etc. and then overlay graphic objects on top, like this:.

Demo 1 screenshot for this tutorial

There are three main types of object:

Points indicate specific locations, such as an address.
Lines indicate linear features such as a road or a route between two points.
Polygons delimit areas, such as countries, states or postcodes.

Each of these can have additional information, such as the street address or business name, that might be displayed on the map. In addition each object can have style information that determines how it is rendered on the map.

Representing these objects and their associated data can become quite complex and the field of GIS (Geographic information systems) has developed several competing data formats for this.

The three main formats that you are likely to encounter, in the context of maps displayed on the web, are Shapefiles, geoJSON and KML.


Shapefiles

The ESRI ARC Shapefile format was developed by ESRI, a company that creates and sells a range of GIS software, including the ArcGIS suite.

Their products are widely used in the professtional GIS community and the Shapefile format is the standard way to distribute many geographic datasets. Examples include datasets from the US Census and the real estate company Zillow.

Although we refer to Shapefiles, the format actually involves multiple files that hold the data required for a single map. The main file types have the suffixes .shp, .shx and .dbf, but there are several more. The .shp file, for example, contains packed variable length data records with some fixed format header fields. You need to read these files sequentially and unpack the data as you go.

Formats like these are a pain to work with. You can't just open them up in a editor and make changes. In addition, the requirement for multiple files is a problem for web applications where you want to download a single file into the browser, access its data and render them in JavaScript.

The bottom line with Shapefiles is that we need to handle them as they are so common, but we need to convert them into something more friendly in order to work with them in web mapping applications.


GeoJSON

GeoJSON is a standard for representing the core geographic features that we are interested in. It doesn't contain everything that a shapefile can contain and, importantly, it does not contain any styling information.

The geoJSON file format is JSON (JavaScript Object Notation) - a standard way to represent data that is human-readable while being easy to parse in software. It is an alternative to XML, which tends to be verbose because of all its tags.

JSON is particularly useful for JavaScript applications as it is easily parse by the language, and as a result, it is a standard way to transmit data to a browser in response to an AJAX request. As text, a block of JSON, and therefore GeoJSON, is valid JavaScript, so you can copy and paste it directly into your code and assign it to a variable. This site will cover, and use, JSON in many of its tutorials.

Here is a sample of GeoJSON that represents a point, a set of lines and a simple polygon:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [-122.3491667, 47.6202333]
      },
      "properties": {
        "name": "Seattle Space Needle"
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [-122.34870, 47.6200], [-122.350, 47.6190], [-122.351, 47.6195], [-122.353, 47.6195]
        ]
      },
      "properties": {
        "name": "Set of Lines"
      }
    },
      {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [-122.3510, 47.6215], [-122.3510, 47.6205], [-122.3500, 47.6205], [-122.3500, 47.6215]
          ]
        ]
      },
      "properties": {
        "name": "Polygon",
        "title": "Example of a Rectangular feature" 
      }
    }  
  ]
}

This represents the blue features shown on this map, the Point, the Lines and the Rectangle.

Image 1 for this tutorial

GeoJSON was developed as an open standard and has been widely adopted. There are several related projects that build upon the format to enable more complex datasets to be represented, for example Topojson.

The typical file suffix for a GeoJSON file is .geojson.

 

An interesting side note here is that GitHub will attempt to render any file with the .geojson suffix as a map. The raw data is available, of course, but seeing the map can be a very useful way to preview the data that the file contains.

Here is an example of this in an Embedded Gist. It shows the boundary of a Seattle neighborhood, superimposed on the base map. This is a LIVE map that you can zoom and pan around in. You can get the raw data from the Original Gist on Github. This feature makes GitHub a great place to store GeoJSON maps.

There are some very nice tools available for working with GeoJSON. In particular, take a look at geojsonlint, from Jason Sanford, which checks that your GeoJSON code is valid and shows how the map will appear - very useful if you generate or edit GeoJSON.

geojson.io allows you to draw features on a map and then save it a GeoJSON - increbidbly useful for 'hand drawn' maps.


KML - Keyhole Markup Language

The company that originally developed Google Earth was called Keyhole before Google bought them. They needed a file format that contained both the geographic data and style directives that instruct the client how to draw the feature. So they developed Keyhole Markup Language (KML), which has become the standard way to represent custom map data in Google Maps.

KML format is now an open standard. It uses XML as the underlying file format so it is, in principle human-readable, albeit less so than GeoJSON, and it is machine parseable.

Here is the same dataset as shown above, but in KML format. This example has no style information included. You can see that it is a lot more verbose.

Complex datasets may result in more than one KML file and in this case they are typically stored as a single, compressed KMZ format file, which is really just a zip file.


GeoJSON or KML ?

Google Maps can read and display GeoJSON datasets but if you want to add custom styling to the objects then you need to do that yourself in JavaScript.

KML format allows you to add styling information to the dataset itself. Which format you choose to work with really depends on the source dataset that you have to work with, how much customization you want to do and

I will add tutorials on how to load KML and GeoJSON files into a Google Map in due course.


Converting between File Formats

If you want to use third-party mapping datasets in your maps then at some stage you will need to convert from one format into another. There are various online conversion sites that will handle some conversions. For example, Ogre will convert anything to GeoJSON and GeoJson to Shapefile.

But the best tool for general conversion appears to be GDAL ogr2ogr, a UNIX command line tool which is part of the Geospatial Data Abstraction Library (GDAL) suite of tools.

If you are on a Mac then you can install this suite using Homebrew with brew install gdal.

You run the tool by specifying Output and Input files, in that order, and specifying the desired output format. It supports many formats and shoud be able to handle any conversion that you need.

This example converts from GeoJSON to KML:
$ ogr2ogr -f KML test.kml test.geojson



Code for this Tutorial


Share this tutorial


Related Tutorials

40 : Display KML in a Google Map   (Intermediate)

41 : Display GeoJSON in a Google Map   (Intermediate)


Comment on this Tutorial


comments powered by Disqus