Home › Resources & tools › Tokenization Tool

Tokenization Tool

Tokenization is the process of breaking up a string into tokens which usually correspond to words. This is a common task in natural language processing (NLP).

The text to tokenize.
The produced tokens.

See also

Afrikaans Alphabet Poster, English-Labeled
$17.99

The Afrikaans alphabet chart.

opera IPA Transcription Poster
$14.99

A poster featuring the phonetic transcription of "opera" in the International Phonetic Alphabet (IPA).

In development
Arabic Pronunciation Dictionary

A pronunciation dictionary of the Arabic language with IPA transcriptions.

一 (ichi) Character Poster
$12.99

A poster featuring the 一 (ichi) kanji character.

Harnessing the power of the Oxford English Dictionary for linguistic research and NLP applications

How the OED Text Annotator may help bring text mining and natural language processing technologies to the next level.

seq2seq Trainer

Train sequence-to-sequence models online.

Awesome IUPAC

A curated list of IUPAC resources.

Bicep Editor

A simple online editor for the Bicep language.

X-SAMPA to IPA

Convert X-SAMPA to IPA online.

All prices listed are in United States Dollars (USD). Visual representations of products are intended for illustrative purposes. Actual products may exhibit variations in color, texture, or other characteristics inherent to the manufacturing process. The products' design and underlying technology are protected by applicable intellectual property laws. Unauthorized reproduction or distribution is prohibited.