Lp-norm (LP)
Note
After careful consideration, we have made the decision to close new customer access to Amazon Sagemaker Clarify, effective 6/30/26. Existing customers can continue to use the service as normal. AWS continues to invest in security and availability improvements for Clarify, but we do not plan to introduce new features. For more information, see Clarify availability change.
The Lp-norm (LP) measures the p-norm distance between the facet distributions of the observed labels in a training dataset. This metric is non-negative and so cannot detect reverse bias.
The formula for the Lp-norm is as follows:
Lp(Pa, Pd) = ( ∑y||Pa - Pd||p)1/p
Where the p-norm distance between the points x and y is defined as follows:
Lp(x, y) = (|x1-y1|p + |x2-y2|p + … +|xn-yn|p)1/p
The 2-norm is the Euclidean norm. Assume you have an outcome distribution with three categories, for example, yi = {y0, y1, y2} = {accepted, waitlisted, rejected} in a college admissions multicategory scenario. You take the sum of the squares of the differences between the outcome counts for facets a and d. The resulting Euclidean distance is calculated as follows:
L2(Pa, Pd) = [(na(0) - nd(0))2 + (na(1) - nd(1))2 + (na(2) - nd(2))2]1/2
Where:
-
na(i) is number of the ith category outcomes in facet a: for example na(0) is number of facet a acceptances.
-
nd(i) is number of the ith category outcomes in facet d: for example nd(2) is number of facet d rejections.
The range of LP values for binary, multicategory, and continuous outcomes is [0, √2), where:
-
Values near zero mean the labels are similarly distributed.
-
Positive values mean the label distributions diverge, the more positive the larger the divergence.
-