It’s great when the stars align, and a lot of people are talking about, and interested in, the things you like. This happened to Astun recently, when the Open Data Institute (ODI) announced a grant, funded by Innovate UK, for developing open source tools for data institutions. Data Institutions are roughly defined as organisations with a remit to collect, maintain, and share data on behalf of others (you can see the ODI’s own definition here), so a partnership with open source software is a natural fit.
There has been a lot of guidance issued on data sharing for both spatial and non-spatial datasets, for creating metadata, and for reporting on data quality, but users also need documentation on how to do this as well as what they should be doing, and why. Astun saw the ODI grant as an opportunity to showcase the GeoNetwork metadata catalog as a one-stop shop for sharing spatial and non-spatial metadata, and fortunately the ODI agreed!
We started by doing some basic user-research to understand current practices, and to establish any blockers around spatial and non-spatial data sharing, and recording data quality. The aggregated results of the survey were:
General use of metadata
- All respondents had a requirement so share spatial data; approximately 90% also had a requirement to share non-spatial data.
- All respondents felt it was important to receive metadata when data is shared; 80% cited it as very or extremely important.
- However, when sharing data only 55% of respondents included metadata “often” or “always”, with the remainder only sharing sometimes, and 10% of cases never.
- 75% of respondents would create metadata if it was easier.
- Blockers for creating spatial / non spatial metadata include lack of knowledge, technical difficulty, lack of time, uncertainty and other reasons. Of these, lack of knowledge was marginally more common than others.
- 95% are aware of the recent Geospatial Commission guidance on data sharing.
Formats for sharing spatial data
- Spatial: Shape file, WMS/WFS account for 50% of responses.
- Non-spatial: CSV and Excel account for 60% of responses, followed by json.
Non-spatial metadata
- 85% feel it’s at least somewhat important to store spatial and non-spatial metadata in the same catalog, or at least a single aggregated search across both.
- The elements for non-spatial metadata should basically match UK Gemini but without the “where” component.
- It’s important to additionally include metadata on the attributes of the dataset.
- There’s no real consistency in the metadata schema for non-spatial metadata.
- There’s no favoured storage implementation for non-spatial metadata.
Data quality
- Blockers for including data quality in metadata were a lack of knowledge, technical difficulty, lack of time, uncertainty and other reasons. Of these, lack of knowledge was marginally more common than others.
- Qualitative reporting is favoured over quantitative.
- Only 43% of respondents use a recognised schema (generally ISO19157).
How we started
Taking these results into account, we decided the best approach for minimising the blockers would be to concentrate on good guidance for creating and maintaining metadata, with specific focus on data quality and the workflow for sharing non-spatial data. We also decided to explore the existing options within GeoNetwork for making the whole workflow as easy as possible, and to look at ways in which we could enhance the capabilities of the software ourselves.
How it’s going
The project timescale was enough for us to get the building-blocks in place, but not to complete all of our intended tasks, so what follows is a work in progress…
We’ve created a metadata profile for GeoNetwork for working with non-spatial metadata and published this on GitHub. This profile strips out all the spatial elements from the core ISO19139 metadata standard, and includes a stream-lined editing view so you only see the elements you really need. We’re working on refining the output formats so that you can easily produce an rdf endpoint that aggregators such as data.gov.uk can consume, and we’re exploring our options for producing the “metadata.json” download advocated by gov.uk for sharing tabular data.
We’ve started a guidance site on data sharing using GeoNetwork on readthedocs. This is designed to augment the existing GeoNetwork manual, explaining specific configuration changes that help with data sharing. We’re also outlining a workflow for sharing non-spatial data.
We’ve started a “snippets” repository on GitHub containing pre-filled xml for common license types and data quality themes that can be easily incorporated into GeoNetwork so that people creating Gemini 2.3 or nonspatial records can choose from a list rather than having to create the xml by hand.
Watch this space!
We’ll be enhancing and refining both these outputs over the next few months, so stay in touch if you’d like to know more!