Troubleshooting
See the following sections to learn how to troubleshoot error when using the training operator.
I can't install the training operator
If you can't install the training operator, make sure that you're using the supported versions of components. For example, if you get an error that your HyperPod AMI release is incompatible with the training operator, update to the latest version.
Incompatible HyperPod task governance version
During installation, you might get an error message that the version of HyperPod task governance is incompatible. The training operator works only with version v1.3.0-eksbuild.1 or higher. Update your HyperPod task governance add-on and try again.
Missing permissions
While you're setting up the training operator or running jobs, you might receive errors that you're not authorized
to run certain operations, such as DescribeClusterNode. To resolve these errors, make sure you correctly set up
IAM permissions while you're setting up the Amazon EKS Pod Identity Agent.