K-Means Hyperparameters
In the CreateTrainingJob
            request, you specify the training algorithm that you want to use. You can also specify
            algorithm-specific hyperparameters as string-to-string maps. The following table lists
            the hyperparameters for the k-means training algorithm provided by Amazon SageMaker AI. For more
            information about how k-means clustering works, see How K-Means Clustering Works.
| Parameter Name | Description | 
|---|---|
| feature_dim | The number of features in the input data. Required Valid values: Positive integer | 
| k | The number of required clusters. Required Valid values: Positive integer | 
| epochs | The number of passes done over the training data. Optional Valid values: Positive integer Default value: 1 | 
| eval_metrics | A JSON list of metric types used to report a score for the model. Allowed values are
                                     Optional Valid values: Either  Default value:  | 
| extra_center_factor | The algorithm creates K centers =  Optional Valid values: Either a positive integer or
                                 Default value:  | 
| half_life_time_size | Used to determine the weight given to an observation when
                                computing a cluster mean. This weight decays exponentially as more
                                points are observed. When a point is first observed, it is assigned
                                a weight of 1 when computing the cluster mean. The decay constant
                                for the exponential decay function is chosen so that after observing
                                     Optional Valid values: Non-negative integer Default value: 0 | 
| init_method | Method by which the algorithm chooses the initial cluster centers. The standard k-means approach chooses them at random. An alternative k-means++ method chooses the first cluster center at random. Then it spreads out the position of the remaining initial clusters by weighting the selection of centers with a probability distribution that is proportional to the square of the distance of the remaining data points from existing centers. Optional Valid values: Either  Default value:  | 
| local_lloyd_init_method | The initialization method for Lloyd's expectation-maximization
                                (EM) procedure used to build the final model containing
                                     Optional Valid values: Either  Default value:  | 
| local_lloyd_max_iter | The maximum number of iterations for Lloyd's
                                expectation-maximization (EM) procedure used to build the final
                                model containing  Optional Valid values: Positive integer Default value: 300 | 
| local_lloyd_num_trials | The number of times the Lloyd's expectation-maximization (EM)
                                procedure with the least loss is run when building the final model
                                containing  Optional Valid values: Either a positive integer or
                                 Default value:  | 
| local_lloyd_tol | The tolerance for change in loss for early stopping of Lloyd's
                                expectation-maximization (EM) procedure used to build the final
                                model containing  Optional Valid values: Float. Range in [0, 1]. Default value: 0.0001 | 
| mini_batch_size | The number of observations per mini-batch for the data iterator. Optional Valid values: Positive integer Default value: 5000 |