

# DataFrame Utilities


Read from and write to catalog tables using pandas DataFrames with automatic format detection and database management.

Supported catalog types:
+ AwsDataCatalog
+ S3CatalogTables

## Basic Usage


Import the DataFrame utilities:

```
from sagemaker_studio import dataframeutils
```

## Reading from Catalog Tables


Required Inputs:
+ database (str): Database name within the catalog
+ table (str): Table name

Optional Parameters:
+ catalog (str): Catalog identifier (defaults to AwsDataCatalog if not specified)
+ format (str): Data format - auto-detects from table metadata, falls back to parquet
+ \$1\$1kwargs: Additional arguments
  + for AwsDataCatalog, kwargs can be columns, chunked, etc
  + for S3Tables, kwargs can be limit, row\$1filter, selected\$1fields, etc

```
import pandas as pd

# Read from AwsDataCatalog
df = pd.read_catalog_table(
    database="my_database",
    table="my_table"
)

# Read from S3 Tables
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
)
```

### Usage with optional parameters


```
import pandas as pd

# Read from AwsDataCatalog by explicitly specifying catalogID and format
df = pd.read_catalog_table(
    database="my_database",
    table="my_table",
    catalog="123456789012",
    format="parquet"
)

# Read from AwsDataCatalog by explicitly specifying catalogID, format, and additional args -> columns
df = pd.read_catalog_table(
    database="my_database",
    table="my_table",
    catalog="123456789012",
    format="parquet",
    columns=['<column_name_1>, <column_name_2>']
)

# Read from S3 Tables with additional args -> limit
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
   limit=500
)

# Read from S3 Tables with additional args -> selected_fields
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
   selected_fields=['<field_name_1>, <field_name_2>']
)
```

## Writing to Catalog Tables


Required Inputs:
+ database (str): Database name within the catalog
+ table (str): Table name

Optional Parameters:
+ catalog (str): Catalog identifier (defaults to AwsDataCatalog if not specified)
+ format (str): Data format used for AwsDataCatalog (default: parquet)
+ path (str): Custom Amazon S3 path for writing to AwsDataCatalog (auto-determined if not provided)
+ \$1\$1kwargs: Additional arguments

Path Resolution Priority - Amazon S3 path is determined in this order:
+ User-provided path parameter
+ Existing database location \$1 table name
+ Existing table location
+ Project default Amazon S3 location

```
import pandas as pd

# Create sample DataFrame
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'value': [10.5, 20.3, 15.7]
})

# Write to AwsDataCatalog
df.to_catalog_table(
    database="my_database",
    table="my_table"
)

# Write to S3 Table Catalog
df.to_catalog_table(
    database="my_database",
    table="my_table",
    catalog="s3tablescatalog/my_s3_tables_catalog"
)
```

### Writing to Catalog Tables


```
# Write to AwsDataCatalog with csv format
df.to_catalog_table(
    database="my_database",
    table="my_table",
    format="csv"
)

# Write to AwsDataCatalog at user specified s3 path
df.to_catalog_table(
    database="my_database",
    table="my_table",
    path="s3://my-bucket/custom/path/"
)

# Write to AwsDataCatalog with additional argument -> compression
df.to_catalog_table(
    database="my_database",
    table="my_table",
    compression='gzip'
)
```