Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.
catatan
apiVersion: inference.sagemaker.aws.amazon.com/v1 kind: JumpStartModel metadata: name:mistral-model namespace: ns-team-a spec: model: modelId: "huggingface-llm-mistral-7b-instruct" modelVersion: "3.19.0" metrics: enabled:true # Default: true (can be set to false to disable) replicas: 2 sageMakerEndpoint: name: "mistral-model-sm-endpoint" server: instanceType: "ml.g5.12xlarge" executionRole: "arn:aws:iam::123456789:role/SagemakerRole" tlsConfig: tlsCertificateOutputS3Uri: s3://hyperpod/mistral-model/certs/
apiVersion: inference.sagemaker.aws.amazon.com/v1 kind: JumpStartModel metadata: name:mistral-model namespace: ns-team-a spec: model: modelId: "huggingface-llm-mistral-7b-instruct" modelVersion: "3.19.0" metrics: enabled:true # Default: true (can be set to false to disable) replicas: 2 sageMakerEndpoint: name: "mistral-model-sm-endpoint" server: instanceType: "ml.g5.12xlarge" executionRole: "arn:aws:iam::123456789:role/SagemakerRole" tlsConfig: tlsCertificateOutputS3Uri: s3://hyperpod/mistral-model/certs/ Deploy a custom inference endpoint Configure custom inference endpoints with detailed metrics settings using the following YAML: apiVersion: inference.sagemaker.aws.amazon.com/v1 kind: InferenceEndpointConfig metadata: name: inferenceendpoint-deepseeks namespace: ns-team-a spec: modelName: deepseeks modelVersion: 1.0.1 metrics: enabled: true # Default: true (can be set to false to disable) metricsScrapeIntervalSeconds: 30 # Optional: if overriding the default 15s modelMetricsConfig: port: 8000 # Optional: if overriding, it defaults to the WorkerConfig.ModelInvocationPort.ContainerPort within the InferenceEndpointConfig spec 8080 path: "/custom-metrics" # Optional: if overriding the default "/metrics" endpointName: deepseek-sm-endpoint instanceType: ml.g5.12xlarge modelSourceConfig: modelSourceType: s3 s3Storage: bucketName: model-weights region: us-west-2 modelLocation: deepseek prefetchEnabled: true invocationEndpoint: invocations worker: resources: limits: nvidia.com/gpu: 1 requests: nvidia.com/gpu: 1 cpu: 25600m memory: 102Gi image: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.32.0-lmi14.0.0-cu124 modelInvocationPort: containerPort: 8080 name: http modelVolumeMount: name: model-weights mountPath: /opt/ml/model environmentVariables: ... tlsConfig: tlsCertificateOutputS3Uri: s3://hyperpod/inferenceendpoint-deepseeks4/certs/
catatan
-
Pilih klaster Anda.
-
-
-
-
-
Available in Grafana dashboard under "Model Container Metrics" panel Metric: tgi_queue_size{resource_name="customer-chat-llama"} Current value: 127 requests queued (indicates backlog) -
-
-
-
hyp observability view -
kubectl get jumpstartmodel -n <namespace> customer-chat-llama -o jsonpath='{.status.metricsStatus}' # Expected: enabled: true, state:Enabledkubectl get jumpstartmodel -n <namespace> customer-chat-llama -o jsonpath='{.status.metricsStatus}' # Expected: enabled: true, state:Enabled -
kubectl port-forward pod/customer-chat-llama-xxx 9113:9113 curl localhost:9113/metrics | grep model_invocations_total# Expected: model_invocations_total{...} metrics -
# Model Container kubectl logs customer-chat-llama-xxx -c customer-chat-llama# Look for: OOM errors, CUDA errors, model loading failures # Proxy/SideCar kubectl logs customer-chat-llama-xxx -c sidecar-reverse-proxy# Look for: DNS resolution issues, upstream connection failures # Metrics Exporter Sidecar kubectl logs customer-chat-llama-xxx -c otel-collector# Look for: Metrics collection issues, export failures
| Isu | Solusi | Tindakan |
|---|---|---|