Enable the EMRFS S3-optimized committer for Amazon EMR 5.19.0
If you are using Amazon EMR 5.19.0 , you can manually set the
spark.sql.parquet.fs.optimized.committer.optimization-enabled
property to true when you create a cluster or from within Spark if
you are using Amazon EMR .
Enabling the EMRFS S3-optimized committer when creating a cluster
Use the spark-defaults configuration classification to set
the
spark.sql.parquet.fs.optimized.committer.optimization-enabled
property to true. For more information, see Configure applications.
Enabling the EMRFS S3-optimized committer from Spark
You can set
spark.sql.parquet.fs.optimized.committer.optimization-enabled
to true by hard-coding it in a SparkConf, passing
it as a --conf parameter in the Spark shell or
spark-submit and spark-sql tools, or in
conf/spark-defaults.conf. For more information, see Spark
configuration
The following example shows how to enable the committer while running a spark-sql command.
spark-sql \ --conf spark.sql.parquet.fs.optimized.committer.optimization-enabled=true \ -e "INSERT OVERWRITE TABLE target_table SELECT * FROM source_table;"