Data discovery at Uber: The continued success of Databook

Uber operates the world's largest ride hailing network serving just under 1,000 metropolitan areas spread across 70 countries.

As companies like Uber continue to grow and expand their operations, so do the amounts of data and associated metadata that they produce on a daily basis. These innovative technology firms put a lot of focus on their data and strive to enable their growing data analytics teams to easily find the data that they need.

To facilitate internal data discovery, Uber built its own dataset search and management tool called Databook. A single interface into the company's metadata graph, Databook indexes hundreds of thousands of datasets, millions of columns and fields, and hundreds of thousands of other data entities such as dashboards and pipelines.

Databook's features and architecture

At a high level, Databook ingests metadata from various sources—primary data storages, services, and crawlers—and makes it accessible to end users via a unified search interface. Users can search for indexed data entities which are updated in real-time and view additional signals such as usage statistics and quality trends.

Since its launch in 2016, the Databook platform has changed significantly to provide better flexibility and extensibility as well as an improved user experience. Overall, "Databook 2.0" includes a number of improvements and helps users cut through the noise while allowing them to comb through every detail when necessary.

Bottom line

Centralized data catalogues are essential for companies that are faced with the ever increasing volumes of distributed and complex data. Uber's Databook provides a unified view of its data ecosystem and continues to evolve and grow with the company.

By powering scalable data discovery and exploration, Databook helps Uber better manage and utilize its own data assets and ensures the global success of the data-driven enterprise.

See also

Aspirin Molecule Poster, 2D Structure, English-Labeled
$19.99

A poster featuring the 2D structure of the aspirin molecule.

статистика IPA Transcription Poster
$14.99

A poster featuring the phonetic transcription of "статистика" in the International Phonetic Alphabet (IPA).

Absolute Value of the Gamma Function Poster, Technical Illustration, English-Labeled
$19.99

A poster featuring the plot of the absolute value of the gamma function.

Big Numerical Series
$29.99

Explore the deep mathematical relationships and typographic beauty within numerical series.

Pulaar Alphabet Poster, English-Labeled
$17.99

The Pulaar alphabet chart.

What is data liquidity?

Defining the concept of data liquidity.

Navigating unstructured data: The rise of question answering

Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.

Linked data for the enterprise: Focus on Bayer's corporate asset register

An overview of COLID, the data asset management platform built using semantic technologies.

Document understanding: Modern techniques and real-world applications

How document understanding helps bring order to unstructured data.

Materials Science Crossword

A daily crossword puzzle for materials science terms.

All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process. The products' design and underlying technology are protected by applicable intellectual property laws. Unauthorized reproduction or distribution is prohibited.