Tokenizers

Concept and definition

Tokenizers

What is Tokenizers?

In natural language processing, a tokeniser is a tool used to break up text into discrete units called tokens. A token can be a word, punctuation, number, symbol or other meaningful unit in the text. The purpose of the tokeniser is to prepare the text for machine learning analysis and modelling.

There are different types of tokenisers, including rule-based and machine learning-based tokenisers. Rule-based tokenisers use predefined patterns to divide text into tokens, while machine learning-based tokenisers use language models to identify patterns and structures in the text and divide it into tokens.

Tokenisers are an important tool in natural language processing, as proper representation of input data is essential for training accurate machine learning models.

« Back to glossary

Do you want to get in touch?

CDRs contain data that a telecommunications company collects about phone calls, such as time and length of call. This data can be used in analytical applications.
Fill the form
Share:
Clustering for data analysis

Clustering methods, or grouping, are a fundamental part of the data analysis process, since they allow an automatic segmentation of the data [...]

Read More »
5 examples of AI where it is applied in your daily life

We often wonder what examples of AI we can find in our environment, and the fact is that artificial intelligence is a concept that in English has [...]

Read More »
Artificial Intelligence to sell more and better: customer development and loyalty.

Artificial intelligence (AI) can change the way sales channels and customers are managed for manufacturers and distributors of consumer products, and can [...]

Read More »
Abbreviated History of Artificial Intelligence

The content of this article synthesizes part of the chapter "Concept and brief history of Artificial Intelligence" of the thesis Generation of Artificial [...]

Read More »
See more entries
© Gamco 2021, All Rights Reserved - Legal notice - Privacy - Cookies