

# Understand connectors
<a name="msk-connect-connectors"></a>

A connector integrates external systems and Amazon services with Apache Kafka by continuously copying streaming data from a data source into your Apache Kafka cluster, or continuously copying data from your cluster into a data sink. A connector can also perform lightweight logic such as transformation, format conversion, or filtering data before delivering the data to a destination. Source connectors pull data from a data source and push this data into the cluster, while sink connectors pull data from the cluster and push this data into a data sink.

The following diagram shows the architecture of a connector. A worker is a Java virtual machine (JVM) process that runs the connector logic. Each worker creates a set of tasks that run in parallel threads and do the work of copying the data. Tasks don't store state, and can therefore be started, stopped, or restarted at any time in order to provide a resilient and scalable data pipeline.

![\[Diagram showing the architecture of a connector cluster.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/mkc-worker-architecture.png)


# Understand connector capacity
<a name="msk-connect-capacity"></a>

The total capacity of a connector depends on the number of workers that the connector has, as well as on the number of MSK Connect Units (MCUs) per worker. Each MCU represents 1 vCPU of compute and 4 GiB of memory. The MCU memory pertains to the total memory of a worker instance and not the heap memory in use.

MSK Connect workers consume IP addresses in the customer-provided subnets. Each worker uses one IP address from one of the customer-provided subnets. You should ensure that you have enough available IP addresses in the subnets provided to a CreateConnector request to account for their specified capacity, especially when autoscaling connectors where the number of workers can fluctuate.

To create a connector, you must choose between one of the following two capacity modes.
+ *Provisioned* - Choose this mode if you know the capacity requirements for your connector. You specify two values:
  + The number of workers.
  + The number of MCUs per worker.
+ *Autoscaled* - Choose this mode if the capacity requirements for your connector are variable or if you don't know them in advance. When you use autoscaled mode, Amazon MSK Connect overrides your connector's `tasks.max` property with a value that is proportional to the number of workers running in the connector and the number of MCUs per worker. 

  You specify three sets of values:
  + The minimum and maximum number of workers.
  + The scale-in and scale-out percentages for CPU utilization, which is determined by the `CpuUtilization` metric. When the `CpuUtilization` metric for the connector exceeds the scale-out percentage, MSK Connect increases the number of workers that are running in the connector. When the `CpuUtilization` metric goes below the scale-in percentage, MSK Connect decreases the number of workers. The number of workers always remains within the minimum and maximum numbers that you specify when you create the connector.
  + The number of MCUs per worker.
  + (Optional) *Maximum autoscaling task count* - The maximum number of tasks allocated to the connector during autoscaling operations. This parameter allows you to set an upper limit on task creation, providing greater control over resource utilization and parallelism in relation to your Kafka topic partitions.

For more information about workers, see [Understand MSK Connect workers](msk-connect-workers.md), and for more information about maximum autoscaling task count, see [Understand maximum autoscaling task count](msk-connect-max-autoscaling-task-count.md). To learn about MSK Connect metrics, see [Monitoring Amazon MSK Connect](mkc-monitoring-overview.md).

# Understand maximum autoscaling task count
<a name="msk-connect-max-autoscaling-task-count"></a>

The `maxAutoscalingTaskCount` parameter is an optional capacity field available for autoscaling connectors in Amazon MSK Connect. This parameter allows you to set an upper limit on the maximum number of tasks that can be created during connector autoscaling operations, providing greater control over resource utilization and performance.

When you use autoscaled capacity mode, Amazon MSK Connect automatically overrides your connector's `tasks.max` property with a value proportional to the number of workers and MCUs per worker. The `maxAutoscalingTaskCount` parameter provides an additional configurable option to limit the maximum number of tasks created for your connector.

This capability is particularly useful when you want to control the level of parallelism in relation to the number of topic partitions in your Kafka cluster. By setting this limit, you can optimize performance and prevent inefficient task distribution that might occur when the automatically calculated task count exceeds your workload requirements.

## Configuration requirements
<a name="msk-connect-max-autoscaling-task-count-requirements"></a>

The `maxAutoscalingTaskCount` parameter must meet the following requirement:

```
maxAutoscalingTaskCount ≥ maxWorkerCount
```

This requirement ensures efficient resource utilization by maintaining at least one task per worker. The system enforces this minimum to optimize connector functionality.

When you specify `maxAutoscalingTaskCount`, the limit is applied immediately upon connector creation and during all subsequent scaling events. As the number of workers increases or decreases during autoscaling operations, the system continues to honor this limit. The `tasks.max` value adjusts proportionally to the number of workers and MCUs per worker but never exceeds the configured `maxAutoscalingTaskCount` value.

If you don't specify this parameter, the connector uses the standard calculation without any limit: `tasks.max = workerCount × mcuCount × tasksPerMcu` (where tasksPerMcu is 2). 

## When to use maxAutoscalingTaskCount
<a name="msk-connect-max-autoscaling-task-count-when-to-use"></a>

Consider using `maxAutoscalingTaskCount` in the following scenarios:
+ *Limited partition count*: When your Kafka topics have a fixed number of partitions that is lower than the automatically calculated task count, setting a limit prevents the creation of idle tasks with no work to perform.
+ *Performance optimization*: When you've identified that a specific task count provides optimal throughput for your workload, you can cap the maximum tasks to maintain consistent performance.
+ *Resource management*: When you want to control the maximum parallelism and resource consumption of your connector regardless of how many workers are running.

## Example
<a name="msk-connect-max-autoscaling-task-count-example"></a>

For a connector with the following configuration:

```
minWorkerCount: 1
maxWorkerCount: 4
mcuCount: 8
maxAutoscalingTaskCount: 15
```

Without `maxAutoscalingTaskCount`, when scaled to 4 workers, the connector would create 64 tasks (4 workers × 8 MCUs × 2 tasks per MCU). With `maxAutoscalingTaskCount` set to 15, the connector creates only 15 tasks, which may be more appropriate if your Kafka topic has 15 or fewer partitions.

# Configure dual-stack network type
<a name="msk-connect-dual-stack"></a>

Amazon MSK Connect supports dual-stack network type for new connectors. With dual-stack networking, your connectors can connect to destinations over both IPv4 and IPv6. Note that IPv6 connectivity is only available in dual-stack mode (IPv4 \$1 IPv6) - IPv6-only networking is not supported.

By default, new connectors use IPv4 network type. To create a connector with dual-stack network type, make sure you've fulfilled the prerequisites described in the following section. Note that, once you create a connector using dual-stack network type, you cannot modify its network type. To change network types, you must delete and recreate the connector.

Amazon MSK Connect also supports service API endpoint connectivity over both IPv6 and IPv4. To use IPv6 connectivity for API calls, you need to use the dual-stack endpoints. For more information about MSK Connect service endpoints, see [Amazon MSK Connect endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/msk-connect.html).

## Prerequisites for using dual-stack network type
<a name="dual-stack-prerequisites"></a>

Before you configure dual-stack network type for your connectors, make sure that all subnets you provide during connector creation have both IPv6 and IPv4 CIDR blocks assigned.

## Considerations for using dual-stack network type
<a name="dual-stack-considerations"></a>
+ IPv6 support is currently available only in dual-stack mode (IPv4 \$1 IPv6), not as IPv6-only
+ Connectors with dual-stack enabled can connect over both IPv4 and IPv6 to both MSK and Sink or Source data systems
+ Network type cannot be modified after connector creation - you must delete and recreate the connector to change network types
+ All subnets specified during the connector creation must support dual-stack for the connector creation to succeed with dual-stack network type
+ If using dual-stack subnets but no network type is specified, the connector will default to IPv4-only for backwards compatibility
+ For existing connectors, you cannot update network type - you must delete and recreate the connector to change network types
+ Using dual-stack networking doesn't incur additional costs

# Create a connector
<a name="mkc-create-connector-intro"></a>

This procedure describes how to create a connector using the AWS Management Console.

**Creating a connector using the AWS Management Console**

1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/).

1. In the left pane, under **MSK Connect**, choose **Connectors**.

1. Choose **Create connector**.

1. You can choose between using an existing custom plugin to create the connector, or creating a new custom plugin first. For information on custom plugins and how to create them, see [Create custom plugins](msk-connect-plugins.md). In this procedure, let's assume you have a custom plugin that you want to use. In the list of custom plugins, find the one that you want to use, and select the box to its left, then choose **Next**.

1. Enter a name and, optionally, a description.

1. Choose the cluster that you want to connect to.

1. In the **Connector network settings** section, choose one of the following for network type:
   + **IPv4** (default) - For connectivity to destinations over IPv4 only
   + **Dual-stack** - For connectivity to destinations over both IPv4 and IPv6 (only available if your subnets have IPv4 and IPv6 CIDR blocks associated with them)

1. Specify the connector configuration. The configuration parameters that you need to specify depend on the type of connector that you want to create. However, some parameters are common to all connectors, for example, the `connector.class` and `tasks.max` parameters. The following is an example configuration for the [Confluent Amazon S3 Sink Connector](https://www.confluent.io/hub/confluentinc/kafka-connect-s3).

   ```
   connector.class=io.confluent.connect.s3.S3SinkConnector
   tasks.max=2
   topics=my-example-topic
   s3.region=us-east-1
   s3.bucket.name=amzn-s3-demo-bucket
   flush.size=1
   storage.class=io.confluent.connect.s3.storage.S3Storage
   format.class=io.confluent.connect.s3.format.json.JsonFormat
   partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
   key.converter=org.apache.kafka.connect.storage.StringConverter
   value.converter=org.apache.kafka.connect.storage.StringConverter
   schema.compatibility=NONE
   ```

1. Next, you configure your connector capacity. You can choose between two capacity modes: provisioned and auto scaled. For information about these two options, see [Understand connector capacity](msk-connect-capacity.md).

1. (Optional) In the **Maximum Autoscaling Task Count** section, use the Maximum Autoscaling Task Count field to enter the maximum number of tasks you want to allocate to the connector during autoscaling operations. The value must be at least equal to your maximum worker count. If you don't specify a value, the connector uses the standard calculation without any limit. For more information, see [Understand maximum autoscaling task count](msk-connect-max-autoscaling-task-count.md).

1. Choose either the default worker configuration or a custom worker configuration. For information about creating custom worker configurations, see [Understand MSK Connect workers](msk-connect-workers.md).

1. Next, you specify the service execution role. This must be an IAM role that MSK Connect can assume, and that grants the connector all the permissions that it needs to access the necessary AWS resources. Those permissions depend on the logic of the connector. For information about how to create this role, see [Understand service execution role](msk-connect-service-execution-role.md).

1. Choose **Next**, review the security information, then choose **Next** again.

1. Specify the logging options that you want, then choose **Next**. For information about logging, see [Logging for MSK Connect](msk-connect-logging.md).

1. On the **Review and create** page, review your connector configuration and choose **Create connector**.

To use the MSK Connect API to create a connector, see [CreateConnector](https://docs.aws.amazon.com/MSKC/latest/mskc/API_CreateConnector.html). 

You can use `UpdateConnector` API to modify the connector's configuration. For more information, see [Update a connector](mkc-update-connector.md).

# Update a connector
<a name="mkc-update-connector"></a>

This procedure describes how to update the configuration of an existing MSK Connect connector using the AWS Management Console.

**Updating connector configuration using the AWS Management Console**

1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/).

1. In the left pane, under **MSK Connect**, choose **Connectors**.

1. Select an existig connector.

1. Choose **Edit connector configuration**.

1. Update the connector configuration. You can't override `connector.class` using UpdateConnector. The following example shows an example configuration for the Confluent Amazon S3 Sink connector. 

   ```
   connector.class=io.confluent.connect.s3.S3SinkConnector
   tasks.max=2
   topics=my-example-topic
   s3.region=us-east-1
   s3.bucket.name=amzn-s3-demo-bucket
   flush.size=1
   storage.class=io.confluent.connect.s3.storage.S3Storage
   format.class=io.confluent.connect.s3.format.json.JsonFormat
   partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
   key.converter=org.apache.kafka.connect.storage.StringConverter
   value.converter=org.apache.kafka.connect.storage.StringConverter
   schema.compatibility=NONE
   ```

1. Choose **Submit**.

1. You can then monitor the current state of the operation in the **Operations** tab of the connector. 

To use the MSK Connect API to update the configuration of a connector, see [UpdateConnector](https://docs.aws.amazon.com/MSKC/latest/mskc/API_UpdateConnector.html).

# Connecting from connectors
<a name="msk-connect-from-connectors"></a>

The following best practices can improve the performance of your connectivity to Amazon MSK Connect.

## Do not overlap IPs for Amazon VPC peering or Transit Gateway
<a name="CIDR-ip-ranges"></a>

If you are using Amazon VPC peering or Transit Gateway with Amazon MSK Connect, do not configure your connector for reaching the peered VPC resources with IPs in the CIDR ranges:
+ "10.99.0.0/16"
+ "192.168.0.0/16"
+ "172.21.0.0/16"