Basic Usage Reading from Catalog Tables Writing to Catalog Tables

DataFrame Utilities

Read from and write to catalog tables using pandas DataFrames with automatic format detection and database management.

Supported catalog types:

AwsDataCatalog
S3CatalogTables

Basic Usage

Import the DataFrame utilities:



from sagemaker_studio import dataframeutils

Reading from Catalog Tables

Required Inputs:

database (str): Database name within the catalog
table (str): Table name

Optional Parameters:

catalog (str): Catalog identifier (defaults to AwsDataCatalog if not specified)
format (str): Data format - auto-detects from table metadata, falls back to parquet
**kwargs: Additional arguments
- for AwsDataCatalog, kwargs can be columns, chunked, etc
- for S3Tables, kwargs can be limit, row_filter, selected_fields, etc



import pandas as pd

# Read from AwsDataCatalog
df = pd.read_catalog_table(
    database="my_database",
    table="my_table"
)

# Read from S3 Tables
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
)

Usage with optional parameters



import pandas as pd

# Read from AwsDataCatalog by explicitly specifying catalogID and format
df = pd.read_catalog_table(
    database="my_database",
    table="my_table",
    catalog="123456789012",
    format="parquet"
)

# Read from AwsDataCatalog by explicitly specifying catalogID, format, and additional args -> columns
df = pd.read_catalog_table(
    database="my_database",
    table="my_table",
    catalog="123456789012",
    format="parquet",
    columns=['<column_name_1>, <column_name_2>']
)

# Read from S3 Tables with additional args -> limit
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
   limit=500
)

# Read from S3 Tables with additional args -> selected_fields
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
   selected_fields=['<field_name_1>, <field_name_2>']
)

Writing to Catalog Tables

Required Inputs:

database (str): Database name within the catalog
table (str): Table name

Optional Parameters:

catalog (str): Catalog identifier (defaults to AwsDataCatalog if not specified)
format (str): Data format used for AwsDataCatalog (default: parquet)
path (str): Custom Amazon S3 path for writing to AwsDataCatalog (auto-determined if not provided)
**kwargs: Additional arguments

Path Resolution Priority - Amazon S3 path is determined in this order:

User-provided path parameter
Existing database location + table name
Existing table location
Project default Amazon S3 location



import pandas as pd

# Create sample DataFrame
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'value': [10.5, 20.3, 15.7]
})

# Write to AwsDataCatalog
df.to_catalog_table(
    database="my_database",
    table="my_table"
)

# Write to S3 Table Catalog
df.to_catalog_table(
    database="my_database",
    table="my_table",
    catalog="s3tablescatalog/my_s3_tables_catalog"
)

Writing to Catalog Tables



# Write to AwsDataCatalog with csv format
df.to_catalog_table(
    database="my_database",
    table="my_table",
    format="csv"
)

# Write to AwsDataCatalog at user specified s3 path
df.to_catalog_table(
    database="my_database",
    table="my_table",
    path="s3://my-bucket/custom/path/"
)

# Write to AwsDataCatalog with additional argument -> compression
df.to_catalog_table(
    database="my_database",
    table="my_table",
    compression='gzip'
)

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

SQL Utilities

Spark Utilities