Technical

How to download the Australian BioNet Database

Did you know that there is a nest of endangered long nosed bandicoots living just beside the popular Manly beach in Sydney, Australia? Well, I didn’t, until I looked at BioNet. The Australian NSW government created BioNet as a government database of all flora and fauna species sightings in NSW. It’s absolutely fantastic. If you’re an architect and want to see how you might impact the urban ecosystem in NSW, look at BioNet. If you’re an ecologist of some kind, you probably already use it. If you’re just a good citizen who wants to remodel your back yard to improve urban ecology, BioNet is there for you.

Fortunately, BioNet comes with an online search system called Atlas. It’s simple to use, but unfortunately it has limits on the data it produces. It won’t show you all the fields associated with species, won’t show meta fields, and has a limit to the quantity of records shown. Thankfully, BioNet comes with an API which can be queried with programming knowledge. I’ve written a bit of Python which will allow you to download regions of data; but before we get to that, let’s see a graphic!

Sydney BioNet species map

I’ve plotted every species on the database close to Sydney in the map above. Size is relative to the number of species sighted (logarithmic relationship). I haven’t done any real filtering beyond this, so it’s not very meaningful, but it shows the data and shows it can be geolocated. It also looks like someone murdered the country, but I’ll post the interesting visualisations in a future post.

The Python code works in two parts. The first queries the API for json results divided into square tiles from a top left and bottom right latitude and longitude coordinate region. This’ll give you a bunch of *.json files in the current working directory. Edit the coordinates and resolution as necessary, and off you go. I’ve put in a series of fields that should be good for more general uses, but you can check the BioNet Data API for all fields.

import os

start = (-33.408554, 150.326152)
end = (-34.207799, 151.408916)

lat = start[0]
lon = start[1]

def create_url(lat, lon, lat_next, lon_next):
    return 'https://data.bionet.nsw.gov.au/biosvcapp/odata/SpeciesSightings_CoreData?$select=kingdom,catalogNumber,basisOfRecord,dcterms_bibliographicCitation,dataGeneralizations,informationWithheld,dcterms_modified,dcterms_available,dcterms_rightsHolder,IBRASubregion,scientificName,vernacularName,countryConservation,stateConservation,protectedInNSW,sensitivityClass,eventDate,individualCount,observationType,status,coordinateUncertaintyInMeters,decimalLatitude,decimalLongitude,geodeticDatum&$filter=((decimalLongitude ge ' + str(lon) + ') and (decimalLongitude le ' + str(lon_next) + ')) and ((decimalLatitude le ' + str(lat) + ') and (decimalLatitude ge ' + str(lat_next) + '))'

i = 0
resolution = 0.05

while (lat > end[0]):
    while (lon < end[1]):
        lat_next = round(lat - resolution, 6)
        lon_next = round(lon + resolution, 6)
        url = create_url(lat, lon, lat_next, lon_next).replace(' ', '%20').replace('\'', '%27')
        os.system('curl \'' + url + "\' -H 'Host: data.bionet.nsw.gov.au' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Cookie: NSC_EBUB_CJPOFU_443_mcwjq=ffffffff8efb154f45525d5f4f58455e445a4a423660' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Cache-Control: max-age=0' > " + str(i) + '.json')
        i += 1

        lon = round(lon + resolution, 6)
    lon = start[1]
    lat = round(lat - resolution, 6)

Now we’ll run another little script which will convert all the json files in the directory into a single csv file. You can read this csv file in programs like Excel or QGIS for further analysis.

import unicodecsv as csv
import json

f = csv.writer(open('bionet.csv', 'wb+'), encoding='utf-8')
number_of_json_files = 352

f.writerow([
    'IBRASubregion',
    'basisOfRecord',
    'catalogNumber',
    'coordinateUncertaintyInMeters',
    'countryConservation',
    'dataGeneralizations',
    'dcterms_available',
    'dcterms_bibliographicCitation',
    'dcterms_modified',
    'dcterms_rightsHolder',
    'decimalLatitude',
    'decimalLongitude',
    'eventDate',
    'geodeticDatum',
    'individualCount',
    'informationWithheld',
    'observationType',
    'protectedInNSW',
    'scientificName',
    'sensitivityClass',
    'stateConservation',
    'status',
    'kingdom',
    'vernacularName',
    ])
i = 0
while i < number_of_json_files:
    data = json.load(open(str(i) + '.json'))
    print(i)
    for x in data['value']:
        f.writerow([
            x['IBRASubregion'],
            x['basisOfRecord'],
            x['catalogNumber'],
            x['coordinateUncertaintyInMeters'],
            x['countryConservation'],
            x['dataGeneralizations'],
            x['dcterms_available'],
            x['dcterms_bibliographicCitation'],
            x['dcterms_modified'],
            x['dcterms_rightsHolder'],
            x['decimalLatitude'],
            x['decimalLongitude'],
            x['eventDate'],
            x['geodeticDatum'],
            x['individualCount'],
            x['informationWithheld'],
            x['observationType'],
            x['protectedInNSW'],
            x['scientificName'],
            x['sensitivityClass'],
            x['stateConservation'],
            x['status'],
            x['kingdom'],
            x['vernacularName'],
            ])
    i += 1

That’s it! Have fun and don’t forget to check for frogs in your backyards. If you don’t have any, build a pond. Or at least a water bath for the birds.

Technical

Building REST APIs with auto-discoverable auto-tested code

For the past few months, one of the projects I’ve been working on with SevenStrokes involves building a REST API for a service. REST APIs are tricky things to get right: they’re deceptively simple to describe, yet play host to plenty of interesting topics to delve into. Such topics can be statelessness, resource scope, authentication, hypermedia representation and so on.

However I’m going to only talk about the very basics (which many people overlook), and demonstrate how the Richardson Maturity Model can help with automated testing and documentation. If you haven’t heard of RMM yet, I recommend you stop reading and go through it now (especially if you’ve built a REST-like API before).

Let’s say our REST API conforms to a level 3 RMM: we have a set of standardised verbs, querying logical resources, receiving standardised status codes, and being able to navigate the entire system via links. We’ve got a pretty good setup so far. All these items in the RMM help our REST API system scale better. However what is doesn’t yet help with is keeping our documentation up to date. This is vital, because we know that the holy grail for REST API is an auto-generated, always up-to-date, stylish documentation that promotes your site/product api. There’s a bunch of tools that help you do this right now, but I think they’re all rather half-baked and used as a bolt-on rather than a core part of your application.

To solve this, I’m going to recommend one more addition: every resource must have the OPTIONS verb implemented. When invoked, it will respond with the following:

  1. An Allow header, specifying all the other verbs available on the invoked resource.
  2. A response body, containing the verbs, and under them in the hierarchy of the body (in whatever format), a description of:
    • Their input parameters, including type, and required boolean
    • A list of example requests and responses, detailing what headers, parameters and body are included in the request, and what headers, status code and body is included in the response.
  3. A list of assumptions that are being made for each example scenario (if applicable)
  4. A list of effects on the system for each example scenario (if applicable)
  5. A list of links to any subresources with descriptions

Let’s see a brief example:

# OPTIONS /user/

{
    "GET": {
        "title": "Get information about your user",
        "parameters": {
            "foobar": {
                "title": "A description of what foobar does",
                "type": "string",
                "required": false
            },
            [ ... snip ... ]
        },
        "examples": [
            {
                "title": "View profile information successfully",
                "request": { "headers": { "Authentication": "{usersignature}" } },
                "response": {
                    "status": 200,
                    "data": {
                        "id": "1",
                        "username": "username1",
                        [ ... snip ... ]
                    }
                }
            },
            [ ... snip ... ]
        ]
    },
    [ ... snip ... ]
    "_links": {
        "self": {
            "href": "\/makkoto-api\/user"
        },
        [ ... snip ... ]
    }
}

Sound familiar? That’s right. It’s documentation. Better than that, it’s embedded documentation. Oh, and better still, it’s auto-discoverable documentation. And if that isn’t great enough, it’s documentation identical to the format of requests and responses that API clients will be working with.

Sure, it’s pretty nifty. But that’s not all! Let’s combine this with TDD/BDD. I’ve written a quick test here:

Feature: Discover
    In order to learn how the REST API works
    As an automated, standards-based REST API client
    I can auto-discover and auto-generate tests for the API

    Scenario: Generate all tests
        Given that I have purged all previously generated tests
        Then I can generate all API tests

That’s right. This test crawls the entire REST API resource tree (starting at the top-level resource, of course), invokes OPTIONS for each resource, and generates tests based on the documentation that you’ve written.

Let’s see a quick demo in action.

Auto-documentation for REST APIs in action

It’s a really great workflow: write documentation first, generate tests from it, and then zone in on your tests in detail. This ensure that your code, tests and documentation are always in sync.

I hope someone finds this useful :) For the curious, the testing tool is Behat, and output format used is application/hal+json, using the HAL specification for linking, and link URI templates.

Life & much, much more

Free public ADOM server available!

As a few people know, I’ve recently gotten myself a VPS. I’m not much of a gamer, but I do enjoy playing certain roguelikes, such as ADOM. Many people are familiar with servers for various MUDs and games such as NetHack, and ADOM isn’t much different. Unfortunately the previous ADOM server seems to have gone MIA, and so I decided to start my own.

So here it is after a week or so of testing and adding new features. It runs on a few shell scripts, and so it was a good opportunity to learn some bash on the way. The features are quite bountiful and it’s been great to play co-op with others, as most roguelikes are traditionally single player. It’s been a great learning experience, and I’m sure others who like playing ADOM would love it too. Suggestions are welcome!

In other news, soon I’ll be able to proudly wear a KOffice t-shirt, we’ve potentially got a new contributor to WIPUP, WIPUP will soon get a lovely REST API and following that, its first CLI app, I’ll be photospamming this blog soon, I’m also now a global moderator on the KDE forums. To finish off, I wanted to share this picture of WIPUP in use:

As you can see, it’s great to watch a project develop and critique it along the way. I hope more people will benefit from WIPUP.