TENtec

TENtec is the European Commission’s Information System to coordinate and support the Trans-European Transport Network Policy (TEN-T). The dataset contains linked data that is transformed from shp files. The dataset is a representation of the Trans-European Transport Network and contains interactive multifunctional maps (including thematic layers, base maps, search engine, print-outs etc.).

The query below shows a part of the dataset, namely the Baltic States covering Estonia, Latvia and Lithuania. The rest of the European Union is also available in the dataset, and can be queried as well.

Matching data from GISCO to the ERA knowledge graph.

The goal is to map the linked data from the GISCO dataset to the ERA knowledge graph. To do this, you can use 4 different methods, going down from accuracy.

  • Administrative linking, using an administrative code or number that can link two entities from different datasets. This is the strongest link.
  • String matching, using strings, such as labels, to match. This can be effective as there is less room for small differences as the two below, but this method is sensitive to typo's and different spelling of names.
  • Geo matching, using geocoordinates to match. This method can work, but the data has here different geosystems and different accuracy, thus could result in mismatch objects.
  • Circumstantial evidence, this method can be effective, but is, most of the time, time-consuming and hard to maintain. This method is also sensitive to mistakes, and there is no default way of linking base on circumstantial evidence. At the moment, the dataset from GISCO does have an administrative link for some countries. Thus, we can use all the matching strategies.

Administrative linking

First up is an administrative matching. We are matching the railways from the GISCO datasets with the several types in the ERA knowledge graph. The railways in the GISCO dataset contain identifiers, and these are stored in three countries as well. Both Finland, Netherlands and Austria are having an gisTENId which can be matched to an identifier in the GISCO dataset. The only challenge here is to put the correct correspondence together. As the lines in the GISCO dataset are a bit longer than the tracks and sections of line that are mostly used to denote routes.

In total, there are 260 different identifiers found in the ERA knowledge graph. Of this 260 identifiers, we can link them to 256 identifiers in the GISCO dataset. This is a really high accuracy score. Almost all the tracks with an identifier in the ERA knowledge graph can be connected to their counterparts in the GISCO dataset. We are seeing that the identifiers are only used on a country basis. Not every country is using the identifier. This is also seen in the visualization of the tracks that have found a counterpart in the ERA knowledge graph. Where only Austria, Finland and the Netherlands are visible on the map, as these are identified.

Geospatial linking

Next up is GeoSpatial matching. We are matching the railways from the GISCO datasets with the only geospatial component in the ERA knowledge graph (the Operational Point). We can map the operational points that are located close to a line from the GISCO dataset. If they are close, then there must be a relation between the line in the GISCO dataset and the Operational point in the ERA dataset. In the example below, we show geospatial mapping for Estonia.

There we use a bit of a different matching strategy, as we are not directly matching two objects of the same type of different datasets together. Instead, we are linking operational points in the ERA knowledge graph and check if the Operational Points are located on a track from the GISCO dataset. This gives a bit of a skewed representation of the data, as there are about 50,000 operational points in the ERA knowledge graph. In total, we can map around 40,000 of these points on or next to a track in the GISCO dataset. Here there is a notation that some points are close to a track are linked to that track, but they are not on that track. These links should not be added to the data. But it will be a bit tricky to keep these out of the data linking part.

The query below shows the visualization of the operational points of Estonia that are laying within a railway track from the GISCO dataset.

Circumstantial evidence linking

Next up is a circumstantial evidence matching. We are matching the railways from the GISCO datasets with the objects from the ERA knowledge graph. Most of the time, you do not want to map the operational points between to a line. However, the track between the operational points need to be mapped to the GISCO line. To do that, we need to do some circumstantial evidence mapping to achieve that. We have a string in the GISCO dataset that denotes if the route of the line in the GISCO dataset. Most often the denotion is in the form of {begin} <--> {end} Where {begin} is the beginning point of the line and the {end} is the endpoint of the line. We can use this notation to retrieve the operational points that belong to the start and the end of the line. With that information, we can find the tracks that could match or are part of the line in the GISCO dataset.

From the around 3500 tracks in the GISCO dataset, there are 399 tracks that can be identified as having both a start operational point and an end operational point, from the ERA knowledge graph. Not only that, but the operational points are also on the same line. In total there are 50.000 operational points, and in total there are 250.000 possible connections to be made between two operational points that are on the same line.

The visualization below shows all the lines that have a beginning point and an endpoint in the ERA knowledge graph that is an operational point. The green point is the operational point that matches the start of the track in the GISCO dataset. Then the pink line is connecting it to the red point, which is the endpoint of the track and also an operational point in the ERA dataset.

For extra information about the datastory or the dataset you can watch this video: