使用 Delta Lake 叢集搭配 Spark 和 AWS Glue

若要使用 AWS Glue Catalog 做為 Delta Lake 資料表的中繼存放區，請使用下列步驟建立叢集。如需使用指定 Delta Lake 分類的資訊 AWS Command Line Interface，請參閱在建立叢集 AWS Command Line Interface 時使用提供組態，或在建立叢集時使用 Java 開發套件提供組態。

建立 Delta Lake 叢集

使用下列內容建立檔案 configurations.json：



[{"Classification":"delta-defaults",  
"Properties":{"delta.enabled":"true"}},
{"Classification":"spark-hive-site",
"Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]

使用下列組態建立叢集，並將 example Amazon S3 bucket path 和 subnet ID 取代為您自己的值。



aws emr create-cluster 
    --release-label  emr-6.9.0  
    --applications Name=Spark  
    --configurations file://delta_configurations.json 
    --region us-east-1  
    --name My_Spark_Delta_Cluster  
    --log-uri  s3://amzn-s3-demo-bucket/  
    --instance-type m5.xlarge  
    --instance-count 2   
    --service-role EMR_DefaultRole_V2  
    --ec2-attributes  InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-1234567890abcdef0

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

Delta Lake 搭配 Spark

考量事項