

# Referencia de transformaciones de PySpark de AWS Glue
<a name="aws-glue-programming-python-transforms"></a>

AWS Glue proporciona las siguientes transformaciones integradas que puede utilizar en operaciones de ETL de PySpark. Los datos pasan de transformación en transformación en una estructura de datos denominada *DynamicFrame*, que es una extensión de `DataFrame` de Apache Spark SQL. `DynamicFrame` contiene sus datos y usted hace referencia a su esquema para procesar los datos. 

La mayoría de estas transformaciones también existen como métodos de la clase `DynamicFrame`. Para obtener más información, consulte [Transformación de DynamicFrame](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-_transforms).
+ [Clase de base GlueTransform](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md)
+ [Clase ApplyMapping](aws-glue-api-crawler-pyspark-transforms-ApplyMapping.md)
+ [Clase DropFields](aws-glue-api-crawler-pyspark-transforms-DropFields.md)
+ [Clase DropNullFields](aws-glue-api-crawler-pyspark-transforms-DropNullFields.md)
+ [Clase ErrorsAsDynamicFrame](aws-glue-api-crawler-pyspark-transforms-ErrorsAsDynamicFrame.md)
+ [Clase de EvaluateDataQuality](aws-glue-api-crawler-pyspark-transforms-EvaluateDataQuality.md)
+ [Clase FillMissingValues](aws-glue-api-crawler-pyspark-transforms-fillmissingvalues.md)
+ [Clase de filtro](aws-glue-api-crawler-pyspark-transforms-filter.md)
+ [Clase FindIncrementalMatches](aws-glue-api-crawler-pyspark-transforms-findincrementalmatches.md)
+ [Clase FindMatches](aws-glue-api-crawler-pyspark-transforms-findmatches.md)
+ [Clase FlatMap](aws-glue-api-crawler-pyspark-transforms-flat-map.md)
+ [Clase Join](aws-glue-api-crawler-pyspark-transforms-join.md)
+ [Clase Map](aws-glue-api-crawler-pyspark-transforms-map.md)
+ [Clase MapToCollection](aws-glue-api-crawler-pyspark-transforms-MapToCollection.md)
+ [mergeDynamicFrame](aws-glue-api-crawler-pyspark-extensions-dynamic-frame.md#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-merge)
+ [Clase Relationalize](aws-glue-api-crawler-pyspark-transforms-Relationalize.md)
+ [Clase RenameField](aws-glue-api-crawler-pyspark-transforms-RenameField.md)
+ [Clase ResolveChoice](aws-glue-api-crawler-pyspark-transforms-ResolveChoice.md)
+ [Clase SelectFields](aws-glue-api-crawler-pyspark-transforms-SelectFields.md)
+ [Clase SelectFromCollection](aws-glue-api-crawler-pyspark-transforms-SelectFromCollection.md)
+ [Clase simplify\_ddb\_json](aws-glue-api-crawler-pyspark-transforms-simplify-ddb-json.md)
+ [Clase Spigot](aws-glue-api-crawler-pyspark-transforms-spigot.md)
+ [Clase SplitFields](aws-glue-api-crawler-pyspark-transforms-SplitFields.md)
+ [Clase SplitRows](aws-glue-api-crawler-pyspark-transforms-SplitRows.md)
+ [Clase Unbox](aws-glue-api-crawler-pyspark-transforms-Unbox.md)
+ [Clase UnnestFrame](aws-glue-api-crawler-pyspark-transforms-UnnestFrame.md)

## Transformaciones de integración de datos
<a name="aws-glue-programming-python-di-transforms"></a>

 Para AWS Glue 4.0 y versiones posteriores, cree o actualice argumentos de trabajo con `key: --enable-glue-di-transforms, value: true`. 

 Ejemplo de script de trabajo: 

```
from pyspark.context import SparkContext
        
from awsgluedi.transforms import *
sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()   
except:
    print("Unexpected Error happened ")
    raise
```

 Ejemplos de sesiones con cuadernos 

```
%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%region eu-west-1
```

```
%%configure
{
    "--enable-glue-di-transforms": "true"
}
```

```
from pyspark.context import SparkContext
from awsgluedi.transforms import *

sc = SparkContext()

input_df = spark.createDataFrame(
    [(5,), (0,), (-1,), (2,), (None,)],
    ["source_column"],
)

try:
    df_output = math_functions.IsEven.apply(
        data_frame=input_df,
        spark_context=sc,
        source_column="source_column",
        target_column="target_column",
        value=None,
        true_string="Even",
        false_string="Not even",
    )
    df_output.show()    
except:
    print("Unexpected Error happened ")
    raise
```

 Ejemplos de sesiones con la AWS CLI 

```
aws glue create-session --default-arguments "--enable-glue-di-transforms=true"
```

 Transformaciones DI: 
+  [Clase FlagDuplicatesInColumn](aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn.md) 
+  [Clase FormatPhoneNumber](aws-glue-api-pyspark-transforms-FormatPhoneNumber.md) 
+  [Clase FormatCase](aws-glue-api-pyspark-transforms-FormatCase.md) 
+  [Clase FillWithMode](aws-glue-api-pyspark-transforms-FillWithMode.md) 
+  [Clase FlagDuplicateRows](aws-glue-api-pyspark-transforms-FlagDuplicateRows.md) 
+  [Clase RemoveDuplicates](aws-glue-api-pyspark-transforms-RemoveDuplicates.md) 
+  [Clase MonthName](aws-glue-api-pyspark-transforms-MonthName.md) 
+  [Clase IsEven](aws-glue-api-pyspark-transforms-IsEven.md) 
+  [Clase CryptographicHash](aws-glue-api-pyspark-transforms-CryptographicHash.md) 
+  [Clase de descifrado](aws-glue-api-pyspark-transforms-Decrypt.md) 
+  [Clase de cifrado](aws-glue-api-pyspark-transforms-Encrypt.md) 
+  [Clase IntToIp](aws-glue-api-pyspark-transforms-IntToIp.md) 
+  [Clase IpToInt](aws-glue-api-pyspark-transforms-IpToInt.md) 

### Maven: agrupe el complemento con sus aplicaciones de Spark
<a name="aws-glue-programming-python-di-transforms-maven"></a>

 Para agrupar la dependencia de las transformaciones con sus aplicaciones y distribuciones de Spark (versión 3.3), puede agregar la dependencia del complemento en su `pom.xml` de Maven y desarrollar sus aplicaciones de Spark de forma local. 

```
<repositories>
   ...
    <repository>
        <id>aws-glue-etl-artifacts</id>
        <url>https://aws-glue-etl-artifacts.s3.amazonaws.com/release/ </url>
    </repository>
</repositories>
...
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>AWSGlueTransforms</artifactId>
    <version>4.0.0</version>
</dependency>
```

 También puede descargar directamente los binarios de los artefactos de AWS Glue Maven e incluirlos en su aplicación de Spark de la siguiente manera. 

```
#!/bin/bash
sudo wget -v https://aws-glue-etl-artifacts.s3.amazonaws.com/release/com/amazonaws/AWSGlueTransforms/4.0.0/AWSGlueTransforms-4.0.0.jar -P /usr/lib/spark/jars/
```