1.0
Introduction:
This paper provides a brief introduction to Geography Markup Language
(GML). The paper is the first in a series of papers to get you acquainted
with this exciting way to represent and manipulate geographic information.
Following articles on this site will introduce you to a variety of GML
topics including GML map making, GML data transformations, spatial queries
and geographic analysis, GML-based spatial databases, and a variety of
GML applications including applications to mobile computing systems. We
expect GML to revolutionize the treatment of spatial information.
GML is web friendly. For the first time spatial information will
have a truly public encoding standard.
2.0 What is
GML ?
2.1 Status
GML or Geography Markup Language is an XML based encoding standard for
geographic information developed by the OpenGIS Consortium (OGC).
It is current status is an RFC under review within the OpenGIS Consortium.
The RFC is supported by a variety of vendors including Oracle Corporation,
Galdos Systems Inc, MapInfo, CubeWerx and Compusult Ltd. GML was implemented
and tested through a series of demonstrations which formed part of the
OpenGIS Consortium's Web Mapping Test Bed (WMT) conducted in September
1999. These tests involved GML mapping clients interacting with GML
data servers and service providers.
2.2 Geography,
Graphics and Maps
Before we look at GML itself, it
is important that we draw some clear distinctions between geographic data
(which is encoded in GML) and graphic interpretations of that data as might
appear on a map or other form of visualization. Geographic data is
concerned with a representation of the world in spatial terms that is independent
of any particular visualization of that data. When we talk about geographic
data we trying to capture information about the properties and geometry
of the objects which populate the world about us. How we symbolize
these on a map, what colors or line weights we use is something quite different.
Just as XML is now helping the Web to clearly separate content from presentation,
GML will do the same in the world of geography.
GML is concerned with the representation
of the geographic data content. Of course we can also use GML to
make maps. This might be accomplished by developing a rendering tool
to interpret GML data, however, this would go against the GML approach
to standardization, and to the separation of content and presentation.
To make a map from GML we need only to style the GML elements into a form
which can be interpreted for graphical display in a web browser.
Potential graphical display formats include W3C Scalable Vector Graphics
(SVG), the Microsoft Vector Markup Language (VML), and the X3D. A
map styler is thus used to locate GML elements and interpret them using
particular graphical styles. The next article in this series will
deal with generating a map from GML using SVG and X3D.
2.3 GML is Text
Like any XML encoding, GML represents
geographic information in the form of text. While a short while ago
this might have been considered verboten in the world of spatial information
systems, the idea is now gaining a lot of momentum. Text has a certain
simplicity and visibility on its side. It is easy to inspect and easy to
change. Add XML and it can also be controlled.
Text formats for geometry and geography
have been employed before. The pioneering work of the
Province
of British Columbia
with its SAIF format is just one such example.
In the Province of British Columbia, more than 7000 files of 1:20,000 scale
data including topography, planimetry (hydrography, buildings, roads etc.)
and toponymy are available in the SAIF format. The Province has shown
that text formats are practical and easy to use. Another example
of the use of text for complex geometric data sets is that of VRML (Vector
Markup Language). Large and complex VRML models have been built and
navigated over the Web all using text based encoding. Interestingly
enough the VRML geometry and behaviour are themselves now being recast
in XML through the efforts of the X3D
Working Group.
2.4 GML Encodes
Feature Geometry and Properties
GML is based on the abstract model
of geography developed by the OGC. This describes the world in terms
of geographic entities called features. Essentially a feature is nothing
more than a list of properties and geometries. Properties have the
usual name, type, value description. Geometries are composed of basic geometry
building blocks such as points, lines, curves, surfaces and polygons.
For simplicity, the initial GML specification is restricted to 2D geometry,
however, extensions will appear shortly which will handle 2 1/2 and 3D
geometry, as well as topological relationships between features.
GML encoding already allows for quite
complex features. A feature can for example be composed of other
features. A single feature like an airport might thus be composed
of other features such as taxi ways, runways, hangers and air terminals.
The geometry of a geographic feature can also be composed of many geometry
elements. A geometrically complex feature can thus consist of a mix
of geometry types including points, line strings and polygons.
To encode the geometry of a
feature like a building we simply write:
<Feature
fid="142" featureType="school" Description="A middle school">
<Polygon name="extent" srsName="epsg:27354">
<LineString name="extent" srsName="epsg:27354">
<CData>
491888.999999459,5458045.99963358 491904.999999458,5458044.99963358
491908.999999462,5458064.99963358 491924.999999461,5458064.99963358
491925.999999462,5458079.99963359 491977.999999466,5458120.9996336
491953.999999466,5458017.99963357 </CData>
</LineString>
</Polygon>
</Feature>
Note that this has no properties (other
than the geometry). These we can readily add and the building would
look something like:
<Feature
fid="142" featureType="school" >
<Description>Balmoral
Middle School</Description>>
<Property Name="NumFloors"
type="Integer" value="3"/>
<Property Name="NumStudents"
type="Integer" value="987"/>
<Polygon name="extent" srsName="epsg:27354">
<LineString name="extent" srsName="epsg:27354">
<CData>
491888.999999459,5458045.99963358 491904.999999458,5458044.99963358
491908.999999462,5458064.99963358 491924.999999461,5458064.99963358
491925.999999462,5458079.99963359 491977.999999466,5458120.9996336
491953.999999466,5458017.99963357 </CData>
</LineString>
</Polygon>
</Feature>
2.5 GML Encodes
Spatial Reference Systems
An essential component of a geographic
system is a means of referencing the geographic features to the earth's
surface or to some structure related to the earth's surface. The
current version of GML incorporates an earth based spatial reference system
which is extensible and which incorporates the main projection and geocentric
reference frames in use today. This is capable of encoding all of the reference
systems which can be found at the European Petroleum Standards Group (EPSG)
web
site. In addition the encoding scheme allows for user defined units
and reference system parameters. Future versions of GML will likely
provide even more flexible encodings in order to handle local coordinate
systems such as used for mile logging etc.
Why encode a spatial reference system
? Why not just provide a unique name and be done with it ?
In many cases such an approach does suffice and GML does not require that
the sender of geographic data also send an encoding of the reference system
to which the data's coordinate values are referenced. There are cases,
however, where such information is very valuable, and include:
-
Client validation of a server specified
Spatial Reference System. Client can request the SRS description
(an XML document) and compare it to its own specifications or show it to
a user for verification.
-
Client display of a server specified
Spatial Reference System.
-
Use by a Coordinate Transformation Service
to validate an input data sources Spatial Reference System.
-
A Coordinate Transformation Service
can compare the SRS description with its own specifications to see if the
SRS is consistent with the selected transformation.
-
To control automated coordinate transformation
by supplying input and output reference system names and argument values.
Watch the GeoJava site for future GML
services that transform GML data from one spatial reference system to another.
With the GML encoding for spatial
references, it is possible to create a web site which stores any number
of spatial reference system definitions. Stay tuned to the GeoJava
site for standard encodings of common spatial reference systems.
2.6 GML Feature Collections
The XML 1.0 Recommendation from the W3C is based on the notion of a
document. The current version of GML is based on XML 1.0, and uses
a FeatureCollection as the basis of its document. A FeatureCollection
is a collection of GML Features together with an Envelope (which bounds
the set of Features), a collection of Properties that apply to the FeatureCollection
and an optional list of Spatial Reference System Definitions. A FeatureCollection
can also contain other FeatureCollections, provided that the Envelope of
the bounding FeatureCollection bounds the Envelopes of all of the contained
FeatureCollections.
When a request is made for GML data from a GML server, data is always
returned in FeatureCollections. There is no limit in the GML RFC
on the number of features which can be contained in a FeatureCollection.
Because FeatureCollections can contain other FeatureCollections it is a
relatively simple procedure to "glue together" FeatureCollections received
from a server into still larger collections.
2.7 GML - More
than a Data Transport
While GML is an effective means for
transporting geographic information from one place to another we expect
that it will also become an important means of storing geographic information
as well. The key element here is XLink and XPointer. While these
two specifications lag in the development and implementation area they
hold great promise for building complex and distributed geographic data
sets. Geographic data is, well, geographic. It is naturally
distributed over the face of the earth. Interest in data about Flin Flon,
Saskatchewan is much much higher near Flin Flon than it would be in Pasadena,
California. At the same time there are applications which need to
reach out and obtain data on a global basis for large scale analysis or
because of interest in a narrow vertical domain. Applications of
the later sort also abound in a diverse collection of fields from environmental
protection to mining, highway construction, and disaster management.
How nice it would be if data could be developed on the local scale and
readily integrated to the regional and the global scale ?
In most jurisdictionsn geographic
data is collected by particular agencies for a particular purpose.
Forest bureaus collect information on the disposition of trees (tree diameters,
site conditions, growth rates) for the effective management of commercial
forests. Environmental departments collect information on the distribution
of animals and animal habitat. Development interests maintain information
on demographics and existing features in the built environment. Real
world problems seldom, however, respect the parochial boundaries of departments,
ministries and bureaus. How nice it would be if data developed for one
purpose could be readily integrated with data developed for another ?
We believe that GML as a storage
format, combined with XLink and XPointer will provide some useful contributions
to these problems. Watch the GeoJava site for our article on GML
Spatial Databases.
2.8 On What Technologies Does it Depend ?
GML is based on XML. XML, while sometimes talked about as a replacement
for HTML, is best thought of as a language for data description.
More correctly, XML is a language for expressing data description languages.
XML is, however, not a programming language. There are no mechanisms in
XML to express behaviour or to perform computations. That is left for other
languages such as Java and C++.
2.8.1 XML Version 1.0
XML 1.0 provides a means of describing (marking up) data using user
defined tags. Each segment of an XML document is bounded by starting
and end tags. This looks as follows:
<Feature>
.... more XML descriptions ...
....
</Feature>
The valid tag names are determined by the Document Type Definition.
Which tags can appear enclosed within an opening and closing tag pair is
also determined by the DTD.
XML tags can also have attributes associated with them. These
are also constrained by the DTD in name and in some cases in terms of the
values that the attributes can assume.
XML is typically read by an XML parser. All XML parsers check
that the data is well formed so that data corruption (e.g. missing closing
tag) cannot pass undetected. Many XML parsers are also validating,
meaning that they check that the document conforms to the associated DTD.
Using XML is it is comparatively easy to generate and validate complex
hierarchical data structures. Such structures are common in geographic
applications.
2.8.2 XSL and XSLT (Transforming the WWW)
The original focus of XML was to provide a means of describing data
separate from its presentation, especially in the context of the world
wide web. XML Version 1.0 deals with the description of data.
A companion technology, called XSL was to deal with the presentation side.
Overtime it has become apparent that XSL is actually two different technologies.
One, now called XSLT (the T stands for Transformation), is focused on the
transformation of XML. The other technology is concerned with the actual
formatting of text or images and is referred to in terms of format objects
or flow objects. In our discussions we are only concerned with XSLT. Since
many tools (e.g. MS IE 5.0) were developed before the XSLT label had stuck,
XSL is still often used when only XSLT is intended. We will follow
that practice.
If you follow xml.com, you may recall a great deal of discussion about
the merits of XSL. The XSLT clarification has helped to dampen this
discussion somewhat, however, there is still a great deal of skepticism
regarding the utility and the need for XSL in some sectors of the XML community.
We stand on the opposite side of the issue. We believe that it is
the transformational character of XML that is most important, and XSL (XSLT)
provides a clean declarative means for expressing these transformations.
In our view XSLT is as essential to GML as XML itself.
XSL is a fairly simple language. It provides a powerful syntax for expressing
pattern matching and replacement. It is declarative. You can easily
read what the XSLT says to do. You do not get to see how it is accomplished.
Using its companion specifications (XPath and XQL) you can specify some
very powerful queries on an XML document. Furthermore XSLT incorporates
the ability to call functions in another programming language such as VBScript
or Java through the use of Extension Functions. This means that XSL
can be used to do the querying and selection, and then call out to Java
or another language to perform needed computation or string manipulation.
For simple tasks, XSLT provides built in string handling and arithmetic
capabilities.
2.8.3 SVG, VML and X3D - Vector Graphics for the Web
XML has made it's presence felt in many different quarters, not the
least of which is vector graphics. Several XML based specifications
for describing vector graphic elements have been developed, including Scalable
Vector Graphics (SVG), Microsoft's Vector Markup Language (VML), and X3D,
the XML incarnation of the syntax and behaviour of VRML (Virtual Reality
Markup Language). These specifications are in many ways similar to
GML, but have a very different objective. Each has a means of describing
geometry. The graphical specifications, however, are focused on appearance
and hence include properties and elements for colors, line weights and
transparency to name but a few aspects. To view an SVG, VML or X3D
data file, it is necessary to have a suitable graphical data viewer.
In the case of VML this is built into IE 5.0 (and nowhere else).
In the case of SVG, Adobe is developing a series of plug-ins for Internet
Explorer and Netscape Communicator as well as Adobe Illustrator, while
IBM and several other companies, are, or have already developed, SVG viewers
or supporting graphics libraries. Several all Java SVG viewers are
available or under development.
To draw a map from GML data you need to transform the GML into one of
the graphical vector data formats such as SVG, VML or VRML. This
means to associate a graphical "style" (e.g. symbol, colour, texture) with
each type of GML feature or feature instance. We will have more to
say on this in the GeoJava article, Making Maps from GML.
Figure 1. illustrates the drawing of map using an XSLT style sheet on
a suitable mapping client.
Figure 1. Making a Map with XSLT and SVG
2.8.4 XLink and XPointer - Linking one place to another
With current HTML technology it is possible to build linked geographic
data sets. One can readily build image maps which are linked to other
image maps. The HTML linking mechanism has, however, many limitations,
and as a result it is not practical to build large complex distributed
data sets as occur in real world systems. The most significant limitation
is that an HTML link is effectively hard coded in both the source (<a
href = ... >) and target (anchor) documents a fact which would any significant
system both fragile and impossible to scale. XLink gets around these
problems by allowing "out of line" links. In an out of line link,
the source points only to a link database and it is the link database that
provides the pointer to specific XML elements in the target document.
The link is thus not hard coded in either document. This is of great
importance in relation to GML as it makes it possible to build scalable,
distributed geographic data sets. Even more importantly, the XLink and
XPointer make it possible to build application specific indexes for a dates.
Need to have a group of buildings organized by street address ?
Want to create a farm plot index based on crop type ? With XLink
and XPointer, these and many other indexing schemes can be readily constructed,
and all without altering the source data itself. We will have much
more to say about this in coming articles.
3.0 Why GML
?
Why introduce GML at all ?
There are already a host of encoding standards for geographic information
including COGIF, MDIFF, SAIF, DLG, SDTS to name only a few. What
is so different about GML ? In some ways nothing. GML is a simple
text based encoding of geographic features. Some of these other formats
are not text based, however, some of them (e.g. SAIF) certainty are.
GML is based on a common model of geography (OGC Abstract Specification)
which has been developed and agreed to by the vast majority of all GIS
vendors in the world. More importantly, however, GML is based on
XML. Why should this matter ? There are several reasons why
XML is important. To begin with XML provides a method to verify data integrity.
Secondly, any XML document can be read and edited using a simple text editor.
Nothing more than MS Notepad is required to view or change an XML document.
Thirdly, since there are an increasing number of XML languages, it will
be more and more easy to integrate GML data with non spatial data.
Even in the case of non-XML non-spatial data this is the case. Perhaps,
most importantly, XML is easy to transform. Using XSLT or almost
any other programming language (VB, VBScript, Java, C++, Javascript) we
can readily transform XML from one form to another. A single mechanism
can thus be employed for a host of transformations from data visualization
to coordinate transforms, spatial queries, and geo-spatial generalization.
GML rests securely on a widely adopted
public standard, that of XML. This ensures that GML data can be viewed,
edited and transformed by a wide variety of commercial and free ware tools.
For the first time we can truly talk about open geographic information.
3.1 Automated
Verification of Data Integrity
One of the important features of
XML is the ability to verify data integrity. In the XML 1.0 Recommendation
this is achieved through the Document Type Definition (DTD). The
DTD specifies the structure of an XML document in a such a way that a validating
parser can verify that a given document instance complies with this DTD.
GML is specified by such a DTD. Future versions of GML will also
be supported by XML Schema, a more flexible integrity mechanism than the
DTD that should become a W3C Recommendation early in 2000.
Using the GML DTD, servers and clients
can readily verify that the data they are to send or receive complies with
the specification. Furthermore this can be accomplished with a variety
of parsing tools by at least a have a dozen different vendors on a wide
variety of operating systems, databases, application servers and browsers.
3.2 GML can
be Read by Public Tools
As we have already noted, GML is
text and one need have nothing more than a simple text editor to read it.
GML, however, is structured, and any of a variety of XML editors can be
employed to display that structure. This makes viewing and navigating GML
data very easy as shown in Figure 2.
Figure 2. Sample GML File Viewed in XML Spy
3.3 GML can
be Easily Edited
Using the many XML editors described
in Section 3.2 it is also very easy to edit GML data. Want to add a new
feature property or change a property value ? Need to adjust a features
geometry. These are easily accomplished with a standard XML editor.
Unlike many other text based formats however there is no way you can corrupt
the data using an XML editor. The editor can be made to ensure that any
data which is created or modified complies with the DTD.
It is also not difficult to create
a graphical editor for GML and such products are expected to appear on
the market within the coming year. Again the GML DTD can be used
to ensure data integrity. Note that when one edits GML graphically
an intermediate graphic representation is required (perhaps SVG) which
is then used to define the geometry of the associated GML feature.
We will have more to say on this subject in our up coming article on Making
Maps from GML to appear on the GeoJava site.
3.4 GML can
readily Integrate with Non-Spatial Data
Binary data structures are typically
very difficult to integrate with one another. A classic example is
that of associating a text document, or a parameter list, with a separately
developed and maintained spatial database of parcels or land tenure boundaries.
With a binary data structure one must understand the file structure or
database schema and be able to modify it. In many legacy systems
using flat files the data structure cannot be modified without breaking
the applications which rely on the existing data structure. With GML it
is comparatively easy to provide links to other XML data elements and this
will dramatically improve with the introduction of XLink and XPointer.
Even links to non-XML elements can be readily handled using the well established
URI syntax.
3.5 GML is Transformable
The most important aspect of XML
in our view is its transformability. It is quite easy to write a
transformation which carries XML data relative to one DTD to XML relative
to another. This is exactly what we do when we generate an SVG graphical
element stream from a GML data file. Such transformations can be
accomplished using a variety of mechanisms including XSLT, Java, Javascript
and C++ to name only a few. XSLT in our view is of particular interest.
With XSLT it is very easy to write a style sheet which locates and transforms
GML elements into other XML elements. Where XSLT is not up to the
task, one can readily incorporate XSLT extension functions written in Java
or VB (the exact languages supported depends on the implementation) to
perform tasks such as string manipulation or mathematical computation.
XSLT can also make use of powerful searching syntax (XPath/XQL) so as to
retrieve elements that satisfy complex boolean expressions on the elements
and their attributes. Using these techniques an XSLT style sheet
can perform a wide variety of querying, analysis and transformation functions.
Consider the following examples:
Using XSLT with suitable extension
functions we can extract spatial elements which satisfy various spatial
and attribute queries. Galdos Systems Inc will be providing just
such a set of spatial extension functions in the near future on the GeoJava
site. Using these functions it will be straightforward to write a
spatial query that extracts features of a given type which lie within a
specified region or which intersect a particular feature.
Change the XSLT style sheet and we
can accomplish a totally different function. We can for example write
a style sheet that performs coordinate transformation as was demonstrated
in the OGC WMT IOC in Washington, September 10, 1999. This immediately
provides us with a coordinate transformation service. Locate GML
data in one part of the world in reference system X and simply pass its
URI to the service and specify the target reference system, and presto
you will have GML in the new frame of reference. Look on the GeoJava
site for upcoming coordinate transformation service for GML data.
Change the XSLT style sheet and we
can accomplish yet another function. We can for example generate an SVG,
VML or X3D map on the server. Select different style sheets for different
viewing devices or different types of maps.
The transformability of GML also
means that we can readily construct application specific indexes or at
least we will be able to once XLink and XPointer implementations start
to move toward reality. Look for this to have a huge impact on the utility
of GML data sets.
3.6 GML can Transport Behaviour
XML is a language for describing data description languages. GML
does not itself encode behaviour. GML can, however, be used in conjunction
with languages like Java or C++ to in effect transport geographic behaviour
from one place to another. This can be done using a simple object
factory which instantiates objects based on received GML data, mapping
the GML element names into object classes. In the Java case this
would mean mapping the GML elements into Java classes as listed in the
OGC Java Simple Features RFC. This "re-hydration" of the GML data then
creates Java objects which have the OGC interfaces for Simple Features
(of course we did not transport the interfaces). GML and Java (or
COM or CORBA) Simple Features can thus get along very well with one another.
In many applications one only needs the behaviour for a small number of
the elements. With this approach one might receive 10,000 GML elements
but only need to construct a hundred or so Java objects on an as needed
basis.
4.0 What's Coming Down the Road ?
I think we have made it pretty clear that we think GML is pretty cool.
Once you have had the opportunity to play with it you will think it is
pretty cool as well. Over the next 6 months a series of articles and services
extending your understanding of GML, and how to apply it in real world
problems, will appear on the GeoJava website. Look for articles on
Map Making, Making maps in SVG, Geographic Transformations, GML Spatial
databases, Mobile applications and much more.
What will happen to GML itself ? We expect quite a lot.
The current version of GML is based on linear geometry and provide no notions
of topology. Over the next several months, new versions of GML will
be introduced adding topology, non-linear feature geometries, 21/2 and
3D geometry, support for OGC Coverages, XSLT spatial query extension functions,
XLink/XPointer support, and an XML Schema implementation.
5.0 Conclusion
GML is a powerful new way to look at spatial information using XML encoding.
It promises. however, much more than a mere encoding standard. The
inherent transformability and accessibility of GML will open a whole new
domain in geo-spatial information management.