Hack-a-LOD-2022

1. Intro

To have accessible data is an important way of education and learning about the history. Google Translate allows translating of the web page into different languages, but the source data stays in the language of origin. To disseminate the knowledge and to make linked data more accessible we decided to translate the instance data, having it in the different languages, maintaining the same URI and having possibility to link/interlink and explore same datasets on different languages. So as a result, the end user can explore the linked data in its native language, having at the same time all the benefits of linked data.

2. Datasets

As a dataset, we have chosen the collection dataset of the Nationaal museum van Wereldculturen. It contains different objects collected all over the world. As the collection is quite large, we minimized our search and used only objects that have dct:spatial "Jakarta" (located in Jakarta the main city of the Indonesia).

At the same time we were looking for a dataset that contains more information about Indonesia, history, geography etc to enrich the current collection with external sources. So the users can in the same time read not only about the item from collection, but over other related objects. We chose to use a dataset of Het Geheugen, an institution that collects photografical material from various dutch musea. The pkl01 dataset that we translated contains photografies of pioneers in Nederland-Indië.

3. Translation

To make sure we only translate literals that contain natural language, we first use a curated collection of predicates to select only the human-readable strings, that could be later translated from Dutch to Indonesian, English, or any other language the translation models support!
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX schema: <http://schema.org/>
PREFIX sdo: <https://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
prefix dct: <http://purl.org/dc/terms/>
prefix edm: <http://www.europeana.eu/schemas/edm/type>

SELECT ?sub ?pred ?ob ?g WHERE {
  {
    VALUES ?pred {
      rdfs:label
      skos:prefLabel
      skos:inScheme
      sdo:name
      schema:name
      dc:description
      dc:title
      dc:subject
      sdo:about
      schema:about
      dct:created
      dct:extent
      dc:isPartOf
      skos:note
      dc:contributor
      edm:type
      dct:terms
      skos:altLabel
      edm:provider
      edm:isRelatedTo
      dc:type
    }
  }
  GRAPH ?g {
    ?sub ?pred ?ob .
  }
}

For the purpose of this demo, we restricted the current results to only the objects that have a spatial relation with Jakarta (?sub dct:spatial "Jakarta").

For translation, we used a python script, using the pre-trained OPUS translation models from the Language Technology Research Group at the University of Helsinki (Helsinki NLP). Using these models, we achieved the translation of entire datasets from Dutch to English, and Indonesian, for more approachable and fair access to linked data. In addition, this application uses SPARQL queries to retrieve the relevant objects from the dataset, containing Dutch language literals. With this implementation, only an endpoint of the Dutch dataset and the desired translation language are needed as input arguments to generate the translated dataset. This method enables access and gives the ability to explore Dutch datasets without language barriers.

4. Items

Results of the translations: a picture and descriptions translated to Indonesian, showed together with original descriptions in Dutch language.

Example of the title translations from Dutch to Indonesian