REPLACE_OUTLIERS
Updates the data point values that classify as outliers, based on the settings in the parameters.
Parameters
-
sourceColumn– Specifies the name of an existing numeric column that might contain outliers. -
outlierStrategy– Specifies the approach to use in detecting outliers. Valid values include the following:-
Z_SCORE– Identifies a value as an outlier when it deviates from the mean by more than the standard deviation threshold. -
MODIFIED_Z_SCORE– Identifies a value as an outlier when it deviates from the median by more than the median absolute deviation threshold. -
IQR– Identifies a values as an outlier when it falls beyond the first and last quartile of column data. The interquartile range (IQR) measures where the middle 50% of the data points are.
-
-
threshold– Specifies the threshold value to use when detecting outliers. ThesourceColumnvalue is identified as an outlier if the score that's calculated with theoutlierStrategyexceeds this number. The default is 3. -
replaceType– Specifies the method to use when replacing outliers. Valid values include the following:-
WINSORIZE_VALUES– Specifies using the minimum and maximum percentile to cap the values. -
REPLACE_WITH_CUSTOM -
REPLACE_WITH_EMPTY -
REPLACE_WITH_NULL -
REPLACE_WITH_MODE -
REPLACE_WITH_AVERAGE -
REPLACE_WITH_MEDIAN -
REPLACE_WITH_SUM -
REPLACE_WITH_MAX
-
-
modeType– Indicates the type of modal function to use whenreplaceTypeisREPLACE_WITH_MODE. Valid values include the following:MIN,MAX, andAVERAGE. -
minValue– Indicates the minimum percentile value for the outlier range that is to be applied whentrimValueis used. Valid range is 0–100. -
maxValue– Indicates the maximum percentile value for the outlier range that is to be applied whentrimValueis used. . Valid range is 0–100. -
value– Specifies the value to insert when usingREPLACE_WITH_CUSTOM. -
trimValue– Specifies whether to remove all or some of the outliers. This Boolean value is set toTRUEwhenreplaceTypeisREPLACE_WITH_NULL,REPLACE_WITH_MODE, orWINSORIZE_VALUES. It defaults toFALSEfor all others.-
FALSE– Removes all outliers -
TRUE–Removes outliers that rank outside of the percentile cap threshold specified inminValueandmaxValue.
-
The following examples display syntax for a single RecipeAction operation. A recipe contains at least one RecipeStep operation, and a recipe step contains at least one recipe action. A recipe action runs the data transform that you specify. A group of recipe actions run in sequential order to create the final dataset.