Query Ordered Lists with SPARQL

Overview

This article will explain how to query ordered lists in SPARQL.

Introduction

Knowledge graphs sometimes need to store lists that contain elements in a meaningful order. For example, a cooking recipe may require performing the following steps:

  1. Mix the dough.
  2. Heat the oven.
  3. Put the dough inside the oven.

When we query the steps of a recipe like this, we must preserve their order. After all, things would go wrong if we put the dough inside an unheated oven, or mix the dough only after it has been warmed in a heated oven.

Lists in knowledge graphs

The RDF standard introduces the rdf:List class for representing ordered lists. Since rdf:List is the most widely used class for this purpose, we will focus on its use for representing lists in this article.

Instances of the rdf:List class follow a linked list structure. This means that every sublist is represented by a node in the knowledge graph with two outgoing edges:

  1. rdf:first points to the first element in the sublist.
  2. rdf:rest points to the rest of the sublist.

Notice that for finite lists we will reach the end of the list at some point. In a linked list, the end of the list is always represented by the empty list. The RDF standard includes the rdf:nil list for this purpose.

We can use the RDF standard as explained above to represent our recipe example. Notice that we use one node to represent the recipe itself (recipe:123), 4 nodes to represent the (sub)lists (list:1, list:2, list:3, rdf:nil), and 3 nodes to represent the steps:

graph LR recipe:123:::recipe -- def:steps --> list:1:::list list:1:::list -- rdf:first --> x1(Mix the dough.):::step list:1:::list -- rdf:rest --> list:2:::list list:2:::list -- rdf:first --> x2(Heat the oven.):::step list:2:::list -- rdf:rest --> list:3:::list list:3:::list -- rdf:first --> x3(Put the dough inside the oven.):::step list:3:::list -- rdf:rest --> rdf:nil:::list classDef recipe fill:lightgreen classDef list fill:sandybrown classDef step fill:white

We can encode this representation in the Turtle or TriG serialization formats as follows:

id:recipe123
  a def:Recipe;
  def:steps list:1.
list:1
  a rdf:List;
  rdf:first 'Mix the dough.'@en;
  rdf:rest list:2.
list:2
  a rdf:List;
  rdf:first 'Heat the oven.'@en;
  rdf:rest list:3.
list:3
  a rdf:List;
  rdf:first 'Put the dough inside the oven.'@en;
  rdf:rest rdf:nil.

Notice that the linked list structure is very generic. For instance, an element can itself be a list, allowing arbitrarily nested lists to be represented with ease.

But the linked list format is also a little verbose: it requires us to write the links between the sublists explicitly. For this reason, the Turtle and TriG formats support an abbreviated list notation. When we use the abbreviated list notation, we can write Snippet 1 as follows:

id:recipe123
  a def:Recipe;
  def:steps
    ( 'Mix the dough.'@en
      'Heat the oven.'@en
      'Put the dough inside the oven.'@en ).

Notice that Snippet 2 is a bit easier to read and write than Snippet 1. But it also gives us less control over how the sublists are named. Specifically, arbitrary Skolem IRIs will be used to represent the sublists, whereas previously we were able to use the purposefully chosen IRIs list:123, list:23, and list:3 to represent the sublists.

Querying lists naively

Let's assume we have a knowledge graph that contains recipes that use lists to represent their required steps. One may naively think that such lists can now be queried with SPARQL in the following way:

select ?step {
  recipe:123 def:steps/rdf:rest*/rdf:first ?step.
}

Let's unpack what happens in this query:

  1. We want to retrieve the steps required by a specific recipe. We therefore include variable ?step in the projection (first line) and specify the IRI recipe:123 in the second line.
  2. We use a property path to traverse the various properties that are used in the list representation (see Diagram 1). Notice that we first traverse the def:steps property, then the rdf:rest property, and finally the rdf:first property. The /-operator is used to compose this sequence of properties into one path.
  3. Depending on the depth of the elements in the list, the rdf:rest property must be traversed zero or more times. This is indicated by the *-operator.

Unfortunately, Query 1 does not work. Or worse: it may seem to work for some lists, but may not work for others. The SPARQL standard does not guarantee the order in which property paths are evaluated. This means that a standard-conforming SPARQL engine may return the last list element first, or the middle element first, or the first element first.

Let us briefly reflect on why the SPARQL standard does not mandate an evaluation order for property paths. An important task of SPARQL implementations is to evaluate SPARQL queries quickly. Speed is achieved by reordering the query so that the fastest operation is executed first. For Query 1 this means that the SPARQL implementation may rewrite the query in the following way:

select ?step {
  recipe:123 def:steps ?x.
  ?x rdf:rest* ?y.
  ?y rdf:first ?step.
}

Now let's suppose that the SPARQL implementation determines that the last triple pattern is the cheapest of the 3 to execute. It will then reorder the query as follows:

select ?step {
  ?y rdf:first ?step.
  ?x rdf:rest* ?y.
  recipe:123 def:steps ?x.
}

Notice that the SPARQL implementation will now start by evaluating the triple pattern with property rdf:first. This may return any of the steps in our dataset, not necessarily the first step. Since SPARQL implementations must be given the freedom to evaluate queries in the most optimal way possible, the SPARQL standard cannot mandate an evaluation order. This is unproblematic for data that is unordered but causes problems whenever data is ordered (as is the case in RDF lists).

Querying lists correctly

In this section we show how you can correctly query ordered lists in SPARQL.

Staying with our example list, let's support we want to generate HTML widgets that enumerate the steps required for a specific recipe. We want to build a function in SPARQL that has the following signature: (rdf:List) → rdf:HTML. We can define this signature in SHACL Advanced:

prefix listf: <https://triplydb.com/Triply/list/model/func/>
prefix list-param: <https://triplydb.com/Triply/list/model/param/>

listf:elements
  a sh:SPARQLFunction;
  sh:parameter list-param:list;
  sh:returnType rdf:HTML;
  sh:select '...'.

list-param:list
  a sh:Parameter;
  sh:class rdf:List;
  sh:order 1;
  sh:path list-param:list.

Notice the following details:

  • We define an instance of sh:SPARQLFuncton.
  • We define a parameter list with sh:parameter.
  • We define individual parameters as instances of sh:Parameter.
  • We define the return type with sh:returnType.

The only thing that remains to be specified is a SPARQL query string (sh:select) that implements this function signature. The essence of our query is the insight that the index at which an element occurs in a list is equal to the number of intermediate sublists to the start of the list (see Diagram 1). The number of sublists can be counted using the aggregate function count. Ordering the elements based on this count will return the list in the correct order.

select (strdt(concat('<ol><li>',group_concat(str(?element);separator='</li><li>'),'</li></ol>'),rdf:HTML) as ?elements) {
  {
    select ?element {
      $list rdf:rest* ?sublist.
      ?sublist rdf:rest*/rdf:first ?element.
    }
    group by ?element
    order by (count(?sublist))
  }
}

Notice the following details:

  • We use function strdt to create a type literal with datatype rdf:HTML.
  • We use function concat and aggregate function group_concat in the projection to create a valid ordered list serialization in HTML5.
  • We use a sub-select to order the elements internally.
  • We use the order by clause to perform the ordering.
  • We use property path notation to traverse the linked list structure (using the *-operator and the /-operator).

Most importantly, notice that we only used standardized features from RDF, SPARQL, and SHACL.

When we insert Query 1 into the sh:select literal of Snippet 3, we finalize our implementation of the SPARQL Function. When we load this snippet into a standards-conforming SPARQL implementation, we can retrieve the ordered list of elements as follows:

prefix listf: <https://triplydb.com/Triply/list/model/func/>

select (listf:elements(?stepsList) as ?stepsHtml) {
  recipe:123 def:steps ?stepsList.
}

Conclusion

Ordered lists are a common requirement in knowledge representation. We saw that it is possible to represent ordered lists using the RDF and SPARQL standards. Some SPARQL implementations introduce proprietary extensions that allow lists to be queried in a supposedly simpler way. However, these extensions are not really necessary. The standardized linked data features are sufficient for querying ordered lists.