Step 4: Configure cqlsh COPY FROM
                settings
This section outlines how to determine the parameter values for cqlsh COPY
                FROM. The cqlsh COPY FROM command reads the CSV file that you
            prepared earlier and inserts the data into Amazon Keyspaces using CQL. The command divides up the
            rows and distributes the INSERT operations among a set of workers. Each
            worker establishes a connection with Amazon Keyspaces and sends INSERT requests along
            this channel. 
The cqlsh COPY command doesn’t have internal logic to distribute work
            evenly among its workers. However, you can configure it manually to make sure that the
            work is distributed evenly. Start by reviewing these key cqlsh parameters:
- 
                DELIMITER – If you used a delimiter other than a comma, you can set this parameter, which defaults to comma. 
- 
                INGESTRATE – The target number of rows that cqlsh COPY FROMattempts to process per second. If unset, it defaults to 100,000.
- 
                NUMPROCESSES – The number of child worker processes that cqlsh creates for COPY FROMtasks. The maximum for this setting is 16, the default isnum_cores - 1, wherenum_coresis the number of processing cores on the host running cqlsh.
- 
                MAXBATCHSIZE – The batch size determines the maximal number of rows inserted into the destination table in a single batch. If unset, cqlsh uses batches of 20 inserted rows. 
- 
                CHUNKSIZE – The size of the work unit that passes to the child worker. By default, it is set to 5,000. 
- 
                MAXATTEMPTS – The maximum number of times to retry a failed worker chunk. After the maximum attempt is reached, the failed records are written to a new CSV file that you can run again later after investigating the failure. 
Set INGESTRATE based on the number of WCUs that you provisioned to the
            target destination table. The INGESTRATE of the cqlsh COPY FROM
            command isn’t a limit—it’s a target average. This means it can (and often does)
            burst above the number you set. To allow for bursts and make sure that enough capacity
            is in place to handle the data load requests, set INGESTRATE to 90% of the
            table’s write capacity.
INGESTRATE = WCUs * .90
Next, set the NUMPROCESSES parameter to equal to one less than the number
            of cores on your system. To find out what the number of cores of your system is, you can
        run the following code.
python -c "import multiprocessing; print(multiprocessing.cpu_count())"
For this tutorial, we use the following value.
NUMPROCESSES = 4
Each process creates a worker, and each worker establishes a connection to Amazon Keyspaces. Amazon Keyspaces can support up to 3,000 CQL requests per second on every connection. This means that you have to make sure that each worker is processing fewer than 3,000 requests per second.
As with INGESTRATE, the workers often burst above the number you set and
            aren’t limited by clock seconds. Therefore, to account for bursts, set your cqlsh
            parameters to target each worker to process 2,500 requests per second. To calculate the
            amount of work distributed to a worker, use the following guideline.
- 
                Divide INGESTRATEbyNUMPROCESSES.
- 
                If INGESTRATE/NUMPROCESSES> 2,500, lower theINGESTRATEto make this formula true.
INGESTRATE / NUMPROCESSES <= 2,500
Before you configure the settings to optimize the upload of our sample data, let's
            review the cqlsh default settings and see how using them impacts the data
            upload process. Because cqlsh COPY FROM uses the CHUNKSIZE to
            create chunks of work (INSERT statements) to distribute to workers, the
            work is not automatically distributed evenly. Some workers might sit idle, depending on
            the INGESTRATE setting.
To distribute work evenly among the workers and keep each worker at the optimal 2,500
            requests per second rate, you must set CHUNKSIZE,
            MAXBATCHSIZE, and INGESTRATE by changing the input parameters.
            To optimize network traffic utilization during the data load, choose a value for
                MAXBATCHSIZE that is close to the maximum value of 30. By changing
                CHUNKSIZE to 100 and MAXBATCHSIZE to 25, the 10,000 rows
            are spread evenly among the four workers (10,000 / 2500 = 4).
The following code example illustrates this.
INGESTRATE = 10,000 NUMPROCESSES = 4 CHUNKSIZE = 100 MAXBATCHSIZE. = 25 Work Distribution: Connection 1 / Worker 1 : 2,500 Requests per second Connection 2 / Worker 2 : 2,500 Requests per second Connection 3 / Worker 3 : 2,500 Requests per second Connection 4 / Worker 4 : 2,500 Requests per second
To summarize, use the following formulas when setting cqlsh COPY FROM parameters:
- 
                INGESTRATE = write_capacity_units * .90 
- 
                NUMPROCESSES = num_cores -1 (default) 
- 
                INGESTRATE / NUMPROCESSES = 2,500 (This must be a true statement.) 
- 
                MAXBATCHSIZE = 30 (Defaults to 20. Amazon Keyspaces accepts batches up to 30.) 
- 
                CHUNKSIZE = (INGESTRATE / NUMPROCESSES) / MAXBATCHSIZE 
Now that you have calculated NUMPROCESSES, INGESTRATE, and
                CHUNKSIZE, you’re ready to load your data.