Uber operates the world's largest ride hailing network serving just under 1,000 metropolitan areas spread across 70 countries.
As companies like Uber continue to grow and expand their operations, so do the amounts of data and associated metadata that they produce on a daily basis. These innovative technology firms put a lot of focus on their data and strive to enable their growing data analytics teams to easily find the data that they need.
To facilitate internal data discovery, Uber built its own dataset search and management tool called Databook. A single interface into the company's metadata graph, Databook indexes hundreds of thousands of datasets, millions of columns and fields, and hundreds of thousands of other data entities such as dashboards and pipelines.
At a high level, Databook ingests metadata from various sources—primary data storages, services, and crawlers—and makes it accessible to end users via a unified search interface. Users can search for indexed data entities which are updated in real-time and view additional signals such as usage statistics and quality trends.
Since its launch in 2016, the Databook platform has changed significantly to provide better flexibility and extensibility as well as an improved user experience. Overall, "Databook 2.0" includes a number of improvements and helps users cut through the noise while allowing them to comb through every detail when necessary.
Centralized data catalogues are essential for companies that are faced with the ever increasing volumes of distributed and complex data. Uber's Databook provides a unified view of its data ecosystem and continues to evolve and grow with the company.
By powering scalable data discovery and exploration, Databook helps Uber better manage and utilize its own data assets and ensures the global success of the data-driven enterprise.
A poster featuring the 2D structure of the aspirin molecule.
A poster featuring the phonetic transcription of "статистика" in the International Phonetic Alphabet (IPA).
A poster featuring the plot of the absolute value of the gamma function.
Explore the deep mathematical relationships and typographic beauty within numerical series.
Defining the concept of data liquidity.
Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.
An overview of COLID, the data asset management platform built using semantic technologies.
How document understanding helps bring order to unstructured data.
A daily crossword puzzle for materials science terms.
All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process. The products' design and underlying technology are protected by applicable intellectual property laws. Unauthorized reproduction or distribution is prohibited.