Languages are an endangered heritage
According to Ethnologue, the number of human languages currently used in the world amounts to almost 7,000. About half of them could be extinct before the end of this century. Only a small fraction of them is supported by some writing system and have written heritage, and among those, still less are used in modern information systems and on the Web. A good indication of the number of languages used on the Web is provided by the multilingual editions of Wikipedia, to-date 285 different languages, that is less than 5% of all known languages. Ranking of languages by importance of their respective Wikipedia is a fairly good indicator for the Web influence of their communities of speakers, but very different from the ranking based on the number of speakers.
We need languages as Linked Data
In current XML and RDF practice, languages are identified by tags, typically used in the
xml:lang attribute. The allowed values of tags are defined by BCP 47. Those language tags are typically used for
rdfs:comment, and allow the filtering of such elements of description by language, for example in SPARQL queries. But they do not provide support for queries such as:
- “Can I find native speakers of Bengali in Berlin?”
- “Which books by Victor Hugo are translated in Arabic?”
- “Is this software documented in Chinese?”
To answer such queries, languages need to be represented as resources, likely to be linked to other resources representing books, people, organizations, places, events, products … through dedicated properties. Such properties can be found in the Lingvoj Ontology. URIs for languages have been defined in lingvoj.org namespace since 2007, and many other URIs have been defined afterwards in the linked data cloud. Since 2010 lingvoj.org URIs mainly redirect to those of lexvo.org.