SPARQL has been widely adopted since first proposed as the query language for the Semantic Web. There are many SPARQL endpoints available today, both public and private, exposing various interlinked data sources that are all part of the global RDF data cloud.
SPARQL federation offers the mechanism for integrating RDF data distributed across multiple sources. It allows data consumers to retrieve and join data from those sources via a single query in a simple and elegant way. This way, it effectively exposes the data as a single integrated RDF graph.
SPARQL federation is a dedicated SPARQL language construct defined in SPARQL 1.1 Federated Query, a W3C Recommendation that introduces the SPARQL SERVICE keyword. When using the SERVICE clause, you need to specify the SPARQL endpoint URL to retrieve the data from together with the query pattern, as demonstrated in the example below.
The following example shows how to query a local RDF graph combined with the data from a remote SPARQL endpoint.
Suppose that the local RDF graph contains only one triple:
<http://example.org/Carol> <http://xmlns.com/foaf/0.1/knows> <http://example.org/Alice> .
while the remote RDF dataset available at the http://people.example.org/sparql endpoint contains the following data:
<http://example.org/Alice> <http://xmlns.com/foaf/0.1/name> "Alice" . <http://example.org/Bob> <http://xmlns.com/foaf/0.1/name> "Bob" .
Locally, we know that <http://example.org/Carol> knows <http://example.org/Alice> but in order to get the name of <http://example.org/Alice>, we need to query the remote http://people.example.org/sparql endpoint. To retrieve the names of all people that <http://example.org/Carol> knows, a single federated query can be used:
SELECT ?name WHERE { <http://example.org/Carol> <http://xmlns.com/foaf/0.1/knows> ?person . SERVICE <http://people.example.org/sparql> { ?person <http://xmlns.com/foaf/0.1/name> ?name . } }
This query retrieves the local data joined with the response from the remote SPARQL endpoint, and returns the following:
name |
---|
"Alice" |
Under the hood, the SERVICE keyword makes the SPARQL query issue a query on another SPARQL endpoint during its execution. The databases and services that support SPARQL 1.1 Federated Query and the SERVICE keyword include, but are not limited to, the following:
As part of executing the federated query, the query processor calls the external SPARQL endpoint. This comes with a number of potential issues that need to be addressed.
The SERVICE keyword makes the federated query processor invoke a portion of a SPARQL query against a remote SPARQL endpoint over HTTP. HTTP communication overheads make those queries slower which adds to the execution time of the whole query run by the federated query processor.
If the remote SPARQL service is unavailable, returns an error, or cannot be accessed for other reasons, the federated query execution will fail as a whole.
It may be desirable to ignore the remote service errors, in which case the query does not fail as a whole but the SERVICE pattern is ignored. This can be achieved by using the SERVICE clause with the SILENT keyword, as in the following query:
SELECT ?name WHERE { <http://example.org/Carol> <http://xmlns.com/foaf/0.1/knows> ?person . SERVICE SILENT <http://people.example.org/sparql> { ?person <http://xmlns.com/foaf/0.1/name> ?name . } }
This query will ignore all errors encountered while accessing the remote http://people.example.org/sparql SPARQL endpoint.
When query processors execute federated queries, the external endpoint URIs are dereferenced and the SERVICE queries and parameters are passed to those external SPARQL query processors.
SPARQL federation does not support authentication, and when your use case involves issuing federated queries distributed over multiple private SPARQL endpoints, it is your responsibility to secure the network and make sure that your remote SPARQL services are only accessible from within that network.
The external SPARQL endpoints, together with the data received and incorporated into the query output, all need to be verified. Therefore, you have to make sure that they satisfy your data processing and licensing requirements.
SPARQL 1.1 Federated Query allows you to distribute your RDF data across multiple databases and use a single query to access it across all those database instances. The data does not need to be colocated or made publicly accessible.
SPARQL federation effectively enriches your working datasets with external public or private data. It is a mechanism for querying and retrieving the data from the joined global RDF graph.
A pronunciation dictionary of te reo Māori with IPA transcriptions.
A poster featuring the phonetic transcription of the word language in the International Phonetic Alphabet (IPA).
Query SPARQL endpoints online.
Visualize the parse tree of SPARQL queries.
SPARQL is a query language for graph data. The graph model of thinking fits well a lot of use cases.
An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.
Experiment with running SPARQL queries against RDF data.
All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process. The products' design and underlying technology are protected by applicable intellectual property laws. Unauthorized reproduction or distribution is prohibited.