PIIDetection - AWS Glue

PIIDetection

Specifies a transform that identifies, removes or masks PII data.

Contents

EntityTypesToDetect

Indicates the types of entities the PIIDetection transform will identify as PII data.

PII type entities include: PERSON_NAME, DATE, USA_SNN, EMAIL, USA_ITIN, USA_PASSPORT_NUMBER, PHONE_NUMBER, BANK_ACCOUNT, IP_ADDRESS, MAC_ADDRESS, USA_CPT_CODE, USA_HCPCS_CODE, USA_NATIONAL_DRUG_CODE, USA_MEDICARE_BENEFICIARY_IDENTIFIER, USA_HEALTH_INSURANCE_CLAIM_NUMBER,CREDIT_CARD,USA_NATIONAL_PROVIDER_IDENTIFIER,USA_DEA_NUMBER,USA_DRIVING_LICENSE

Type: Array of strings

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: Yes

Inputs

The node ID inputs to the transform.

Type: Array of strings

Array Members: Fixed number of 1 item.

Pattern: [A-Za-z0-9_-]*

Required: Yes

Name

The name of the transform node.

Type: String

Pattern: ([^\r\n])*

Required: Yes

PiiType

Indicates the type of PIIDetection transform.

Type: String

Valid Values: RowAudit | RowHashing | RowMasking | RowPartialMasking | ColumnAudit | ColumnHashing | ColumnMasking

Required: Yes

DetectionParameters

Additional parameters for configuring PII detection behavior and sensitivity settings.

Type: String

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: No

DetectionSensitivity

The sensitivity level for PII detection. Higher sensitivity levels detect more potential PII but may result in more false positives.

Type: String

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: No

MaskValue

Indicates the value that will replace the detected entity.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 256.

Pattern: [*A-Za-z0-9_-]*

Required: No

MatchPattern

A regular expression pattern used to identify additional PII content beyond the standard detection algorithms.

Type: String

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: No

NumLeftCharsToExclude

The number of characters to exclude from redaction on the left side of detected PII content. This allows preserving context around the sensitive data.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

NumRightCharsToExclude

The number of characters to exclude from redaction on the right side of detected PII content. This allows preserving context around the sensitive data.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

OutputColumnName

Indicates the output column name that will contain any entity type detected in that row.

Type: String

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: No

RedactChar

The character used to replace detected PII content when redaction is enabled. The default redaction character is *.

Type: String

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: No

RedactText

Specifies whether to redact the detected PII text. When set to true, PII content is replaced with redaction characters.

Type: String

Pattern: ([\u0009\u000B\u000C\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF])*

Required: No

SampleFraction

Indicates the fraction of the data to sample when scanning for PII entities.

Type: Double

Valid Range: Minimum value of 0. Maximum value of 1.

Required: No

ThresholdFraction

Indicates the fraction of the data that must be met in order for a column to be identified as PII data.

Type: Double

Valid Range: Minimum value of 0. Maximum value of 1.

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: