View a markdown version of this page

Semantic search for AWS Glue Data Catalog - AWS Glue

Semantic search for AWS Glue Data Catalog

Note

Business context and semantic search is in preview for AWS Glue and is subject to change.

Semantic search enables you to discover data assets by meaning in addition to exact keyword matching. Results are ranked by semantic similarity to your query. You can narrow results using filters on asset type, metadata fields, and glossary terms.

Using the Search API

The Search API requires at least one of SearchText or FilterClause. Optional parameters: Sort, Aggregations, MaxResults, NextToken.

Text search

aws glue search \ --search-text "customer purchases"

Example output:

{ "Items": [ {"Id": "c9vq7sh2fk4t2h", "AssetName": "Customer Sales Transactions", "AssetTypeId": "glue-table"} ], "TotalCount": 2 }

Limit results with --max-results:

aws glue search \ --search-text "quarterly revenue" \ --max-results 5

Filtering search results

Use FilterClause to narrow results. Supported filter types:

  • AttributeFilter – Operators: equals, greaterThan, greaterThanOrEquals, lessThan, lessThanOrEquals, notExists.

  • MapFilter – Filters on a map attribute's key-value pair.

  • AndAllFilters – All filters must match (logical AND).

  • OrAnyFilters – At least one must match (logical OR).

To filter by a single attribute

Filter to table assets only:

aws glue search \ --search-text "revenue" \ --filter-clause '{ "AttributeFilter": { "Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"} } }'
To combine filters with AND logic

Find table assets updated after a timestamp:

aws glue search \ --search-text "customer data" \ --filter-clause '{ "AndAllFilters": [ {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}}, {"AttributeFilter": {"Attribute": "updatedAt", "Operator": "greaterThan", "Value": {"LongValue": 1718400000}}} ] }'
To combine filters with OR logic

Search for tables or views:

aws glue search \ --search-text "customer data" \ --filter-clause '{ "OrAnyFilters": [ {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}}, {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-view"}}} ] }'
To filter by a map attribute

Filter by glossary term key-value pair:

aws glue search \ --search-text "financial data" \ --filter-clause '{ "MapFilter": { "Attribute": "glossaryTerms", "Key": "classification", "Value": {"StringValue": "PII"} } }'
To use nested AND and OR filters

Find table or view assets with a specific glossary term:

aws glue search \ --search-text "customer" \ --filter-clause '{ "AndAllFilters": [ {"OrAnyFilters": [ {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}}, {"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-view"}}} ]}, {"MapFilter": {"Attribute": "glossaryTerms", "Key": "term", "Value": {"StringValue": "Revenue"}}} ] }'

Sorting search results

By default, results are sorted by semantic relevance. To sort by attribute:

aws glue search \ --search-text "customer purchases" \ --sort '{"Attribute": "assetName", "Order": "ASCENDING"}'
aws glue search \ --search-text "customer purchases" \ --sort '{"Attribute": "updatedAt", "Order": "DESCENDING"}'
Note

When you specify a sort attribute, results are ordered by that attribute rather than by semantic relevance.

Computing aggregations

Use Aggregations to compute counts grouped by attribute values.

aws glue search \ --search-text "customer data" \ --aggregations '[{"Attribute": "assetTypeId"}]'

Example output:

{ "TotalCount": 15, "Aggregations": [ {"Attribute": "assetTypeId", "Items": [ {"Value": "glue-table", "Count": 10}, {"Value": "glue-view", "Count": 3} ]} ] }

Request multiple aggregations in a single call:

aws glue search \ --search-text "financial data" \ --aggregations '[{"Attribute": "assetTypeId"}, {"Attribute": "glossaryTerms"}]'

Paginating search results

When results exceed MaxResults, the response includes a NextToken. Use it to retrieve additional pages.

aws glue search \ --search-text "customer data" \ --max-results 10 \ --next-token "eyJsYXN0RXZhbHVhdGVkS2V5Ijp7ImlkIjp7InMiOiJhMWIyYzNkNCJ9fX0="

Continue until the response no longer includes a NextToken.

Running filter-only queries

Use only FilterClause without SearchText to list assets without semantic ranking.

aws glue search \ --filter-clause '{"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "glue-table"}}}' \ --max-results 20
Note

You must provide at least one of SearchText or FilterClause.

Examples

To discover skill assets for a domain

Find skill assets related to sales data:

aws glue search \ --search-text "sales domain usage rules" \ --filter-clause '{"AttributeFilter": {"Attribute": "assetTypeId", "Operator": "equals", "Value": {"StringValue": "skill-asset"}}}'
Note

This query returns only custom skill assets. Managed skills are not returned by the Search API.

To combine text search, aggregations, and sorting

Get a comprehensive view of matching assets:

aws glue search \ --search-text "customer" \ --aggregations '[{"Attribute": "assetTypeId"}]' \ --sort '{"Attribute": "updatedAt", "Order": "DESCENDING"}' \ --max-results 10