Home › Resources & tools › Tokenization Tool

Tokenization Tool

Tokenization is the process of breaking up a string into tokens which usually correspond to words. This is a common task in natural language processing (NLP).

The text to tokenize.
The produced tokens.

See also

Vitamin C Molecule Poster, Ball-and-Stick Model, English-Labeled
$19.99

A poster featuring the ball-and-stick model of the vitamin C molecule.

Old Church Slavonic Alphabet Poster, English-Labeled
$17.99

The Old Church Slavonic alphabet chart.

Periodic Table Poster, 32-Column, English-Labeled
$19.99

A periodic table of the elements.

Big Chemical Structures
$29.99

A collection of elaborate, large-format chemical structure depictions. Discover the fine details of chemical structures in their own right.

In development
Māori–English Dictionary

A bilingual dictionary for learning and translating Māori to English.

Harnessing the power of the Oxford English Dictionary for linguistic research and NLP applications

How the OED Text Annotator may help bring text mining and natural language processing technologies to the next level.

Pharmaceutical companies pronunciation guide

A pronunciation guide to the names of pharmaceutical companies.

Inuktitut Word of the Day

Discover a new Inuktitut word every day.

Chemical elements in Occitan

The list of names of chemical elements in the Occitan language.

Cree Word of the Day

Discover a new Cree word every day.

All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process.