Document understanding: Modern techniques and real-world applications

Documents are at the center of many business processes. Scanned pages and PDFs are ubiquitous and contain large amounts of information represented as forms and tables.

Historically, this information could only be analysed and used following manual data re-entry—the process which is slow and prone to error—as traditional optical character recognition (OCR) systems haven't been able to analyse such data and preserve its inherent structure in their output.

Document understanding is concerned with advancing the abilities of document intelligence by supporting the retrieval of structured data in addition to simple text. A process that heavily relies on machine learning, it has proven key to automating structured data extraction and unlocking its full potential by making it readily accessible for subsequent processing and analysis.

How document understanding works

"Understanding" a document first involves detecting its layout and key elements such as figures, tables, and forms. These elements are then processed separately to extract the underlying data relationships.

Any embedded forms are parsed into sets of key-value pairs, each pair corresponding to a single form field. An example of a key-value pair is "First name"–"Alice". The sets of such linked data items can subsequently be inserted into a database, one row or document per form.

Document understanding products and services

The easiest way to incorporate document understanding into production workflows is to use existing cloud services. Major cloud providers each offer multiple machine learning-based services which include text and document intelligence. These offerings are summarised in the following table:

ServiceProviderDescription
Amazon TextractAmazon Web ServicesAmazon Textract parses form data and tables. The service is integrated with Amazon Augmented AI (Amazon A2I) for implementing human review.
Document AIGoogle CloudPreviously known as Document Understanding AI, Document AI is capable of parsing forms, tables, and invoice content (the invoice feature is only available for approved customers).
Form RecognizerMicrosoft AzureForm Recognizer extracts tables and key-value form pairs from documents and offers prebuilt models for analysing receipts and business cards.

Industry use cases

Document understanding is a key component of various emerging practical workflows and applications.

An example application of document understanding is invoice processing. Invoices are commonly sent as PDFs or paper documents that can be formatted in different ways but generally contain the same type of information such as invoice date, amount due, payment terms, etc. By being able to automatically recognise and extract this information, cognitive invoicing systems facilitate invoice processing and reduce the associated costs.

Bottom line

By automating manual document activities, document understanding enables organisations to process documents more efficiently, reduce error, and bring down costs. By helping extract the valuable information stored inside scanned and digital documents, it assists in search and discovery and compliance control for these documents.

The extracted structured data can be ingested by various downstream business applications, enabling smarter workflows and more advanced processing at scale.

See also

Ukrainian Alphabet Poster, English-Labeled
$17.99

The Ukrainian alphabet chart.

language IPA Transcription Poster
$14.99

A poster featuring the phonetic transcription of the word language in the International Phonetic Alphabet (IPA).

Ethane Molecule Poster, Ball-and-Stick Model, English-Labeled
$19.99

A poster featuring the ball-and-stick model of the ethane molecule.

Periodic Table Poster, 32-Column, English-Labeled
$19.99

A periodic table of the elements.

языковед Morphemic Analysis Poster
$14.99

A poster featuring the morphemic analysis of the Russian word языковед.

What is data liquidity?

Defining the concept of data liquidity.

Navigating unstructured data: The rise of question answering

Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.

Data discovery at Uber: The continued success of Databook

How Uber's in-house platform powers discovery, exploration, and knowledge at scale.

Linked data for the enterprise: Focus on Bayer's corporate asset register

An overview of COLID, the data asset management platform built using semantic technologies.

CodeQL Editor

A simple online editor for CodeQL files.

All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process. The products' design and underlying technology are protected by applicable intellectual property laws. Unauthorized reproduction or distribution is prohibited.