Dutch data (Imspoor)

Introduction

A dataset that fits the bill nicely to also be transformed from the relational data to graph data is the the dataset from Prorail called IMSpoor. De IMSpoor dataset is a dataset that contains all the information about railroad tracks in the Netherlands.

Matching data from IMSpoor data Netherlands to the ERA knowledge graph.

Goal is to map the linked data from the IMSpoor data Netherlands to the ERA knowledge graph. . To do this you can use 4 different methods, going down from accuracy.

  • Adminstrative linking, using an administrative code or number that can link two entities from different datasets. This is the strongest link.
  • String matching, using strings, such as labels to match. This can be effictive as there is less room for small differences as the two below, but this method is sensitive for typo's and different spelling of names.
  • Geo matching, using geocoordinates to match, this can work, but here different geosystems and different accuracy, could mismatch objects.
  • Circumstantial evidence, this method can be effective, but is most of the time, time consuming and hard to maintain. This method is also sensitive for mistakes, and there is no default way of linking base on circumstantial evidence.

Administrative Linking

Prorail correctly added in the uopIds of operational points in their data making it possible to link via administrative linking between the datasets. In total we can link up 324 operational points together. Which is a nice result. The tabel below showcases how many links where possible and how many we made.

Could not show query result, the query is not accessible.

The tabel only gives an overview. But the map below showcases that across the countries there are matches between two datasets. Which is a good indicator that matching is possible between the two datasets. The next step would be to consolidate and check why several operational points are not yet matched.

Could not show query result, the query is not accessible.

All results of administrative matching

GEO Linking

Prorail has a lot of geographical datapoints of operational points in their data making it possible to link via geographical linking between the datasets.The results here are to match the two geographically. But as you can see it is not that great of a match. Some are located next to each other but look as if the objects are different. Meaning that only geo linking would result in sub optimal results.

Could not show query result, the query is not accessible.
Could not show query result, the query is not accessible.

For extra information about the datastory or the dataset you can watch the following video: