human-readable-no-label
This query enumerates nodes with no human-readable label.
It is a best practice for each node to have a human-readable label.
disconnected-class
Enumerates classes that are not specified as the domain and/or range of any property.
completeness-no-range
Enumerates properties that do not have a range specified.
disconnected-triples
Data quality aspect
A dataset sometimes contains statements (i.e., triples) that are isolated from the rest of the knowledge graph.
Purpose
This query enumerates such isolated triples. These are not connected to the rest of the graph in any way.
Implementation
We check for the following:
- Does the subject node has inlinks?
- Does the object node have outlinks?
- Are there other triples within this same record?
If neither of the above is true, the triple is shown in this query as a disconnected edge.
incompleteness-node-type
Data quality aspect
Implicitly, every linked data node is an instance of rdfs:Resource
. However, it is a best practice to make the instance-of assertion explicit. Also, a node is rarely a direct instance of rdfs:Resource
, so making the assertion explicit often prompts the data publisher to think about a more descriptive class hierarchy.
incompleteness-property-type
Data quality aspect
Terms that are used in the predicate position of at least one triple are implicitly instances of rdf:Property
. However, it is a best practice to make explicit whether a property is a datatype property (instance of owl:DatatypeProperty
) or object property (instance of owl:ObjectProperty
).
inconsistency-domain-usage
Data quality aspect
There is sometimes an inconsistency between the defined domain for properties (rdfs:domain
) and the subject terms that are used with those properties in the data.
Related
- The same consistency check can be done for ranges: query
inconsistent-range-usage
Data quality aspect
There is sometimes an inconsistency between the defined range for properties (rdfs:range
) and the object terms that are used with those properties in the data.
Implementation
This query identifies the use of properties in the data ([ ?p ?o].
) and identifies the classes of the corresponding object terms. These classes can either be asserted through rdf:type
for IRIs, or be part of the term itself (extracted with datatype/1
) for literals. rdfs:Resource
is used as a fallback if no class is specified in the data.
Related
- The same consistency check can be done for domains: query
datatype-properties-with-iris
Data quality aspect
consistency > range > syntax
Shows predicates that are defined in the vocabulary as owl:DatatypeProperty
, but that have IRIs appear in their object position in at least some statements.
empty-lexical-forms
Data quality issue
Correctness > syntax > null
In traditional data paradigms it was often required to enter a value, even if the value was not present for a certain object. In linked data there is no reason to use null values anymore, and the use of null values is often merely a byproduct of old data sources and/or old habits.
Purpose
This query enumerates the empty literals that appear in a dataset.
object-properties-with-literals
Data quality issue
consistency > range > syntax
Query purpose
This query shows the predicates that are defined in the vocabulary as object properties (owl:ObjectProperty
), but that have literals appear in the object position of data triples in at least some statements.
doubles-that-could-be-integers
Data quality issue
Incorrectness > semantic > term > numeric
Datasets sometimes define their numeric data incorrectly at the term level. There is an important distinction between decimal numbers (including integers) and floating-point numbers. Both are defined in XML Schema 1.1: Datatypes. It is especially common to represent decimal numeric data using floating-point numbers.
Purpose
This query gives an overview of the properties that are likely to using floating-point numbers to represent decimal numeric data.
Implementation
This is done by automatically converting each double (xsd:double
) to an integer (xsd:integer
), and back to a double again. If no information was lost, the double could have been modeled as an integer.
datatypes-that-could-be-objects
The query enumerates datatype properties that have a relatively small number of unique values. Such properties might be better modeled as object properties, and their values as IRIs. This is computed heuristically, based on the ratio between unique and non-unique literal occurrence.
encoding-issues
Encoding issues are introduced when text is saved with an encoding other than Unicode (UTF-8).