1. Intro
To have accessible data is an important way of education and learning about the history. Google Translate allows translating of the web page into different languages, but the source data stays in the language of origin. To disseminate the knowledge and to make linked data more accessible we decided to translate the instance data, having it in the different languages, maintaining the same URI and having possibility to link/interlink and explore same datasets on different languages. So as a result, the end user can explore the linked data in its native language, having at the same time all the benefits of linked data.
2. Datasets
As a dataset, we have chosen the collection dataset of the Nationaal museum van Wereldculturen. It contains different objects collected all over the world. As the collection is quite large, we minimized our search and used only objects that have dct:spatial "Jakarta"
(located in Jakarta the main city of the Indonesia).
At the same time we were looking for a dataset that contains more information about Indonesia, history, geography etc to enrich the current collection with external sources. So the users can in the same time read not only about the item from collection, but over other related objects. We chose to use a dataset of Het Geheugen, an institution that collects photografical material from various dutch musea. The pkl01 dataset that we translated contains photografies of pioneers in Nederland-Indië.
3. Translation
To make sure we only translate literals that contain natural language, we first use a curated collection of predicates to select only the human-readable strings, that could be later translated from Dutch to Indonesian, English, or any other language the translation models support!PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX schema: <http://schema.org/>
PREFIX sdo: <https://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
prefix dct: <http://purl.org/dc/terms/>
prefix edm: <http://www.europeana.eu/schemas/edm/type>
SELECT ?sub ?pred ?ob ?g WHERE {
{
VALUES ?pred {
rdfs:label
skos:prefLabel
skos:inScheme
sdo:name
schema:name
dc:description
dc:title
dc:subject
sdo:about
schema:about
dct:created
dct:extent
dc:isPartOf
skos:note
dc:contributor
edm:type
dct:terms
skos:altLabel
edm:provider
edm:isRelatedTo
dc:type
}
}
GRAPH ?g {
?sub ?pred ?ob .
}
}
For the purpose of this demo, we restricted the current results to only the objects that have a spatial relation with Jakarta (?sub dct:spatial "Jakarta"
).
For translation, we used a python script, using the pre-trained OPUS translation models from the Language Technology Research Group at the University of Helsinki (Helsinki NLP). Using these models, we achieved the translation of entire datasets from Dutch to English, and Indonesian, for more approachable and fair access to linked data. In addition, this application uses SPARQL queries to retrieve the relevant objects from the dataset, containing Dutch language literals. With this implementation, only an endpoint of the Dutch dataset and the desired translation language are needed as input arguments to generate the translated dataset. This method enables access and gives the ability to explore Dutch datasets without language barriers.
4. Items
Results of the translations: a picture and descriptions translated to Indonesian, showed together with original descriptions in Dutch language.
Example of the title translations from Dutch to Indonesian