One schema, one API: Inside the world of Data Commons

Data published by government agencies, intergovernmental organisations, and other authoritative sources is used by data analysts, software developers, and researchers alike on a daily basis. While each of the public datasets can be relatively easily accessed and used individually, most business and research questions can only be answered using a combination of multiple data sources after they have been "meaningfully" and "carefully" joined together.

Data Commons is an initiative that aims to clean, reconcile, and join publicly available datasets into a single open knowledge graph. Launched in May 2018, it has since grown to include the data from the United States Census Bureau, Eurostat, the Organisation for Economic Co-operation and Development, and many other sources.

A unified data model

An amalgamation of numerous public datasets, Data Commons itself is a dataset structured around a large collection of entities: places, people, organisations, and so on. All data flowing into the Data Commons graph is viewed as facts about those entities, making the resulting dataset a valuable source of "meaningfully" combined entity-oriented data.

The data vocabulary used to structure the Data Commons graph builds upon Schema.org, the most widely used vocabulary for structured data on the web, and is documented at schema.datacommons.org.

Accessing the Data Commons graph

The Data Commons knowledge graph is a source of demographics, housing, education, and other types of data which can be used to power a range of tools and visualisations. Notable users of the Data Commons data already include Google Search and the Common Knowledge Project.

As an RDF-style knowledge graph, Data Commons can be queried using SPARQL. For example, the following query returns the names of three U.S. States with their DCIDs:

SELECT ?name ?dcid
WHERE {
  ?place typeOf Place .
  ?place name ?name .
  ?place dcid ("geoId/06" "geoId/21" "geoId/24") .
  ?place dcid ?dcid
}

Query result:

namedcid
"Maryland""geoId/24"
"Kentucky""geoId/21"
"California""geoId/06"

The Data Commons Graph can also be accessed programmatically using Python helper libraries or in Google Sheets via the Data Commons add-on.

Bottom line

Data Commons compiles and joins thousands of open datasets into a single knowledge graph which provides a unified view across all data sources, making the data more accessible and readily available for research and analysis.

See also

Bosnian Alphabet Poster, English-Labeled
$17.99

The Bosnian alphabet chart.

Adamantane Molecule Poster, 2D Structure, English-Labeled
$19.99

A poster featuring the 2D structure of the adamantane molecule.

небосклон Morphemic Analysis Poster
$14.99

A poster featuring the morphemic analysis of the Russian word небосклон.

paradox IPA Transcription Poster
$14.99

A poster featuring the phonetic transcription of the word paradox in the International Phonetic Alphabet (IPA).

"Hello, World!" Code Snippet Poster, JavaScript Programming Language
$14.99

A poster featuring the "Hello, World!" program in JavaScript programming language.

The list of schema registries

An overview of the technologies used to discover and manage event or message schemas.

Schemas are everywhere

An overview of common schema types and their usage.

LUT Keyboard (Leiden Unified Transliteration/Transcription Keyboard)

Type hieroglyphic texts in LUT (Leiden Unified Transliteration/Transcription) online.

FASTA Editor

A simple online editor for biological sequences in the FASTA format.

Italian Word of the Day

Discover a new Italian word every day.

All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process. The products' design and underlying technology are protected by applicable intellectual property laws. Unauthorized reproduction or distribution is prohibited.