

# Semantic search for AWS Glue Data Catalog
<a name="catalog-semantic-search"></a>

**Note**  
Business context and semantic search is in preview for AWS Glue and is subject to change.

Semantic search enables you to discover data assets by meaning in addition to exact keyword matching. Results are ranked by semantic similarity to your query. You can narrow results using filters on asset type, metadata fields, and glossary terms.

## Using the Search API
<a name="catalog-semantic-search-api"></a>

The `Search` API requires at least one of `SearchText` or `FilterClause`. Optional parameters: `Sort`, `Aggregations`, `MaxResults`, `NextToken`.

### Text search
<a name="catalog-semantic-search-text"></a>

```
aws glue search \
    --search-text "customer purchases"
```

Example output:

```
{
    "Items": [
        {"Id": "c9vq7sh2fk4t2h", "AssetName": "Customer Sales Transactions", "AssetTypeId": "glue-table"}
    ],
    "TotalCount": 2
}
```

Limit results with `--max-results`:

```
aws glue search \
    --search-text "quarterly revenue" \
    --max-results 5
```

### Filtering search results
<a name="catalog-semantic-search-filters"></a>

Use `FilterClause` to narrow results. Supported filter types:
+ **AttributeFilter** – Operators: `equals`, `greaterThan`, `greaterThanOrEquals`, `lessThan`, `lessThanOrEquals`, `notExists`.
+ **MapFilter** – Filters on a map attribute's key-value pair.
+ **AndAllFilters** – All filters must match (logical AND).
+ **OrAnyFilters** – At least one must match (logical OR).

**To filter by a single attribute**  
Filter to table assets only:

```
aws glue search \
    --search-text "revenue" \
    --filter-clause '{
        "AttributeFilter": {
            "Attribute": "assetTypeId",
            "Operator": "equals",
            "Value": {"StringValue": "glue-table"}
        }
    }'
```

**To combine filters with AND logic**  
Find table assets updated after a timestamp:

```
aws glue search \
    --search-text "customer data" \
    --filter-clause '{
        "AndAllFilters": [
            {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}},
            {"AttributeFilter": {"Attribute": "updatedAt", "Operator": "greaterThan", "Value": {"LongValue": 1718400000}}}
        ]
    }'
```

**To combine filters with OR logic**  
Search for tables or views:

```
aws glue search \
    --search-text "customer data" \
    --filter-clause '{
        "OrAnyFilters": [
            {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}},
            {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-view"}}}
        ]
    }'
```

**To filter by a map attribute**  
Filter by glossary term key-value pair:

```
aws glue search \
    --search-text "financial data" \
    --filter-clause '{
        "MapFilter": {
            "Attribute": "glossaryTerms",
            "Key": "classification",
            "Value": {"StringValue": "PII"}
        }
    }'
```

**To use nested AND and OR filters**  
Find table or view assets with a specific glossary term:

```
aws glue search \
    --search-text "customer" \
    --filter-clause '{
        "AndAllFilters": [
            {"OrAnyFilters": [
                {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}},
                {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-view"}}}
            ]},
            {"MapFilter": {"Attribute": "glossaryTerms", "Key": "term", "Value": {"StringValue": "Revenue"}}}
        ]
    }'
```

### Sorting search results
<a name="catalog-semantic-search-sort"></a>

By default, results are sorted by semantic relevance. To sort by attribute:

```
aws glue search \
    --search-text "customer purchases" \
    --sort '{"Attribute": "assetName", "Order": "ASCENDING"}'
```

```
aws glue search \
    --search-text "customer purchases" \
    --sort '{"Attribute": "updatedAt", "Order": "DESCENDING"}'
```

**Note**  
When you specify a sort attribute, results are ordered by that attribute rather than by semantic relevance.

### Computing aggregations
<a name="catalog-semantic-search-aggregations"></a>

Use `Aggregations` to compute counts grouped by attribute values.

```
aws glue search \
    --search-text "customer data" \
    --aggregations '[{"Attribute": "assetTypeId"}]'
```

Example output:

```
{
    "TotalCount": 15,
    "Aggregations": [
        {"Attribute": "assetTypeId", "Items": [
            {"Value": "glue-table", "Count": 10},
            {"Value": "glue-view", "Count": 3}
        ]}
    ]
}
```

Request multiple aggregations in a single call:

```
aws glue search \
    --search-text "financial data" \
    --aggregations '[{"Attribute": "assetTypeId"}, {"Attribute": "glossaryTerms"}]'
```

### Paginating search results
<a name="catalog-semantic-search-pagination"></a>

When results exceed `MaxResults`, the response includes a `NextToken`. Use it to retrieve additional pages.

```
aws glue search \
    --search-text "customer data" \
    --max-results 10 \
    --next-token "eyJsYXN0RXZhbHVhdGVkS2V5Ijp7ImlkIjp7InMiOiJhMWIyYzNkNCJ9fX0="
```

Continue until the response no longer includes a `NextToken`.

## Running filter-only queries
<a name="catalog-semantic-search-filter-only"></a>

Use only `FilterClause` without `SearchText` to list assets without semantic ranking.

```
aws glue search \
    --filter-clause '{"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}}' \
    --max-results 20
```

**Note**  
You must provide at least one of `SearchText` or `FilterClause`.

## Examples
<a name="catalog-semantic-search-examples"></a>

**To discover skill assets for a domain**  
Find skill assets related to sales data:

```
aws glue search \
    --search-text "sales domain usage rules" \
    --filter-clause '{"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "skill-asset"}}}'
```

**Note**  
This query returns only custom skill assets. Managed skills are not returned by the Search API.

**To combine text search, aggregations, and sorting**  
Get a comprehensive view of matching assets:

```
aws glue search \
    --search-text "customer" \
    --aggregations '[{"Attribute": "assetTypeId"}]' \
    --sort '{"Attribute": "updatedAt", "Order": "DESCENDING"}' \
    --max-results 10
```