Rail Data Marketplace — what’s it good for?

The setting sun illuminates the transfer deck at Reading station on 4 December 2020. Reading is included in Network Rail's daily station footfall datafeed, available on the Rail Data Marketplace. Paul Kelly

The Rail Data Marketplace is officially still in beta release but is now starting to gain a little traction, and there is already some useful data on there that is not available anywhere else. I thought it was worth taking a look in detail, since there has not really been much publicity aimed specifically at consumers of the data.

Origins of negative perceptions of the RDM

To honestly discuss the RDM requires placing it in context regarding the open availability of data within the rail industry. The years from 2013 to 2017 were exciting times. More and more open data was being released both by Network Rail and the Rail Delivery Group, culminating (inter alia) in the launch in 2017 of RDG’s excellent National Rail Data Portal (NRDP). In these years the focus was very strongly on data being made available for zero cost – this was particularly relevant for data sources which had previously had extremely expensive licensing fees attached, such as fares and routeing guide data.

But after 2017 it somehow felt like a lot of steam had gone out of the open data movement, leaving users of the existing open data feeds unsure what the direction of travel was. With expecations building around the Wiliams Review from 2019 and the government taking total control of all franchised train operators in 2020, it was a reasonable expectation that the situation might change.

The RDM was launched by the Department for Transport in 2021. In my opinion it suffered from the government’s tendency to prefer announcing things over actually doing them, since it was hugely hyped long before it was created. This led to the unfortunate situation where negative rumours could not be effectively countered with hard information, because none existed yet!

In particular there were rumours that the RDM would enable charging for data sources that were currently free, which were not rebuffed. It felt like a real change of direction from the previous big push for releasing more and more open data.

From what I can see it now seems clear that there is very little chance that existing data sources which are freely available are going to be charged for, although as far as I’m aware there has been no official denial that it will ever happen, and this is maybe a problem.

However the fact is: there are many parties ready and willing to pay good money for access to aggregated and cleaned rail data and they do so every day, and I can certainly understand the wish to get a slice of this action – although whether RDG should have full visibility of what is being charged for third party augmented data sources (implicit due to RDM’s billing mechanism) is a really interesting question in itself, which I’m not going to address now.

Interesting New Data

And there are certainly some interesting new free data sources in the RDM! As far as I can see, it seems to be the people behind the RDM itself who are putting pressure on and getting these sources opened up. Here are a few that caught my eye.

IDMS Reference Data

Data Product header for the fare route restrictions file (one component of the IDMS reference data)

This provides “friendly” lower-case names for stations, fare locations and route restrictions, ticket types etc. It was inexplicably absent for many years from the open data feeds despite almost all other fares-related data being available – the RDM seems to have provided the impetus for this being finally opened up and that is a really good thing!

Some of the detailed human-readable descriptions of fare route restrictions included in the IDMS data feed

Annual station-to-station journey counts

Data Product header for the journey count data for financial year 2022-23

This is based on LENNON sales data with various adjustments. It is also something that has been asked for for many years, to no avail until the RDM became involved and would appear to have the “clout” to get things made available.

A small extract from the annual journey count data for financial year 2018-19: from Reading to various nearby stations

Daily STATION Footfall Data

Data Product header for the daily station footfall data

This covers the 13 or so stations operated by Network Rail. It has separate “in” and “out” counts so I assumed it is based on ticket gate data, and it is really interesting to be able to see up-to-date daily variations here.

“In” and “Out” footfall numbers for Reading and King’s Cross stations, for some dates in early 2024

In general, it’s fantastic to see new data sources like this appearing and the RDM is to be commended in getting these data sources opened up.

Some Pain Points

Unfortunately, a couple of aspects of the RDM seem unnecessarily harder to use than they should be:

Sign-up process

One obvious example is the sign-up process. It is actually quite straightforward and clear and for most users the sign-up process runs smoothly and without problems. But perception is important:

  1. The sign-up page is very detailed and appears to be asking for a lot of unnecessary details (e.g. phone number, full name and registered or billing address, type of organisation plus separate display names both for your organisation and personal account). These details of course make sense in the context of a formal business registration that will result in contracting for paid data feeds. But it could be a bit intimidating to a casual hobbyist or student who just wants to download some free data feeds or preview data schemas and examples for the paid feeds.
  2. There are documented cases of people having difficulty registering, e.g. a student who had his application rejected.

Together these facts create (in my opinion) a perception that the RDM is a bit restricted in who is welcome to access it, and this is unfortunate.

Downloading file-based data feeds

Another pain point is that, although the system is fairly well-optimised for API-based data sources (which work quite well, including ours from BR Fares – more on that later) the majority of currently available data consists of flat files which must be individually downloaded. And this is quite a tedious process which involves logging in and manually navigating through your subscriptions to get to the “Data Files” section before you can download the file.

In fact it is incredibly sub-optimal if you have a number of files that are potentially updated daily (such as the approx. 13 IDMS files) and must be manually downloaded for ingest into another system. I have asked if there were any plans to make it possible to automatically download flat file feeds (e.g. using key or token authentication) but unfortunately was told that there are none. In my opinion this is something that really needs to be fixed if the RDM is going to gain widespread acceptance as a data source.

BR Fares on the Rail Data Marketplace

I mentioned that our APIs are available – since December 2023, the premium BR Fares APIs (Easy Fares API, Season Ticket Price API, and Rovers & Rangers API) are available through the RDM. It provides quite a nice interface for API-based feeds and it’s possible to view all parameters for each endpoint and try out API calls directly in your browser.

You can also use key-based authentication to call the APIs from an external system, which makes it really useful – a feature which is unfortunately sorely lacking from the file-based data sources, as mentioned above! In my opinion RDM works quite well for API data sources; it seems to be a fairly well-designed and (more-or-less) production-grade system. The biggest limitation I have encountered so far is that only a maximum of 3 endpoints per API are supported, although RDM have told me they are considering increasing this limit. In my opinion a well-designed API should only have a few endpoints anyway otherwise it gets confusing to use, although 3 is a rather low limit.

To view full details of endpoints and parameters and specs and to try it out, you may subscribe to the “Demo Version” of any API. This is limited to 100 API calls within a 30-day period. If you need more than that then the next step up is the “Low Volume” tier (listed in RDM as a separate product) where the minimum charge of £50 per API enables 1,000 API hits for each of the Easy Fares and Season Ticket Price APIs, or 1,500 API hits for the Rovers & Rangers API, over a 30-day billing period.

Data Product header for the high volume tier of our Easy Fares API

A word on pricing – the prices are based on what we’ve negotiated with customers previously and what people seem happy to pay when using fares data to meet a business need. RDM forces a “one size fits all” pricing model which in my opinion is not really a bad thing, as negotiating custom pricing models based on an individual customers’ needs and profit potential can be one of the most stressful parts of business-to-business sales!

However due to some limitations in the flexibility of the pricing model in RDM – which is actually very good, but in my opinion not quite flexible enough for the sort of pricing models companies want in real life – we have 3 separate versions of each API with low, medium, and high volume pricing models, in order to be able to offer a fair price across all levels of usage. This is really beyond the scope of this blog, in which I wanted to focus more on the point of view of the data consumer rather than publisher, but I will go into more detail on this in a later post.

Conclusions

The Rail Data Marketplace has been launched very prominently and now has some interesting data available on it. The heavy promotion by the DfT and RDG is a real selling point, since it gives the subject of railway data visibility beyond the dedicated enthusiasts who have worked out where the data is stored and how to retrieve it over many years.

However in my opinion it could have been launched better, if it was made clear that

  • all industry data will stay free and RDM will be a catalyst to open up more data
  • RDM will also be a platform for better visibility of value-added data from third parties

This is effectively what exists now, but it is not promoted as such and it is a real pity!

Another big plus point is that the RDM has full-time staff behind it who are focussed on promoting the availability and use of data. This works particularly well when an organisation has data it wants to make available and the RDM can present itself as a “default” central repository for railway data.

However there is plenty of data that exists already but is difficult to obtain or not available. The real test of the RDM will be if it is able to exert the necessary pressure to get this opened up through becoming the default platform for data exchange in the rail industry. I wish it well and look forward to the results!

PK

Author: Paul Kelly

Technical Lead at BR Fares. Interested in railway fares, timetable and service issues for over a decade.

Leave a Reply

Your email address will not be published.