TOKENIZATION
Splits text into smaller units, or tokens, such as individual words or terms.
Parameters
-
sourceColumn– The name of an existing column. -
delimiter— A custom delimiter that appears between tokenized words. (The default behavior is to separate each token by a space.) -
expandContractions— IfENABLED, expands contracted words. For example: "don't" becomes "do not". -
stemmingMode— Splits text into smaller units or tokens, such as individual lowercase words or terms. Two stemming modes are available:PORTER|LANCASTER. -
stopWordRemovalMode— Removes common words like a, an, the, and more. -
customStopWords— ForStopWordRemovalMode, allows you to specify a custom list of stop words. -
targetColumn— The name of a column to contain the results.
Example
{ "Action": { "Operation": "TOKENIZATION", "Parameters": { "customStopWords": "[]", "delimiter": "- ", "expandContractions": "ENABLED", "sourceColumn": "dimensions", "stemmingMode": "PORTER", "stopWordRemovalMode": "DEFAULT", "targetColumn": "dimensions_tokenized" } } }