-

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

catatan

apiVersion: inference.sagemaker.aws.amazon.com/v1 kind: JumpStartModel metadata: name:mistral-model namespace: ns-team-a spec: model: modelId: "huggingface-llm-mistral-7b-instruct" modelVersion: "3.19.0" metrics: enabled:true # Default: true (can be set to false to disable) replicas: 2 sageMakerEndpoint: name: "mistral-model-sm-endpoint" server: instanceType: "ml.g5.12xlarge" executionRole: "arn:aws:iam::123456789:role/SagemakerRole" tlsConfig: tlsCertificateOutputS3Uri: s3://hyperpod/mistral-model/certs/

apiVersion: inference.sagemaker.aws.amazon.com/v1 kind: JumpStartModel metadata: name:mistral-model namespace: ns-team-a spec: model: modelId: "huggingface-llm-mistral-7b-instruct" modelVersion: "3.19.0" metrics: enabled:true # Default: true (can be set to false to disable) replicas: 2 sageMakerEndpoint: name: "mistral-model-sm-endpoint" server: instanceType: "ml.g5.12xlarge" executionRole: "arn:aws:iam::123456789:role/SagemakerRole" tlsConfig: tlsCertificateOutputS3Uri: s3://hyperpod/mistral-model/certs/ Deploy a custom inference endpoint Configure custom inference endpoints with detailed metrics settings using the following YAML: apiVersion: inference.sagemaker.aws.amazon.com/v1 kind: InferenceEndpointConfig metadata: name: inferenceendpoint-deepseeks namespace: ns-team-a spec: modelName: deepseeks modelVersion: 1.0.1 metrics: enabled: true # Default: true (can be set to false to disable) metricsScrapeIntervalSeconds: 30 # Optional: if overriding the default 15s modelMetricsConfig: port: 8000 # Optional: if overriding, it defaults to the WorkerConfig.ModelInvocationPort.ContainerPort within the InferenceEndpointConfig spec 8080 path: "/custom-metrics" # Optional: if overriding the default "/metrics" endpointName: deepseek-sm-endpoint instanceType: ml.g5.12xlarge modelSourceConfig: modelSourceType: s3 s3Storage: bucketName: model-weights region: us-west-2 modelLocation: deepseek prefetchEnabled: true invocationEndpoint: invocations worker: resources: limits: nvidia.com/gpu: 1 requests: nvidia.com/gpu: 1 cpu: 25600m memory: 102Gi image: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.32.0-lmi14.0.0-cu124 modelInvocationPort: containerPort: 8080 name: http modelVolumeMount: name: model-weights mountPath: /opt/ml/model environmentVariables: ... tlsConfig: tlsCertificateOutputS3Uri: s3://hyperpod/inferenceendpoint-deepseeks4/certs/
catatan

  1. Pilih klaster Anda.

    Available in Grafana dashboard under "Model Container Metrics" panel Metric: tgi_queue_size{resource_name="customer-chat-llama"} Current value: 127 requests queued (indicates backlog)

  1. hyp observability view
  2. kubectl get jumpstartmodel -n <namespace> customer-chat-llama -o jsonpath='{.status.metricsStatus}' # Expected: enabled: true, state:Enabled
    kubectl get jumpstartmodel -n <namespace> customer-chat-llama -o jsonpath='{.status.metricsStatus}' # Expected: enabled: true, state:Enabled
  3. kubectl port-forward pod/customer-chat-llama-xxx 9113:9113 curl localhost:9113/metrics | grep model_invocations_total# Expected: model_invocations_total{...} metrics
  4. # Model Container kubectl logs customer-chat-llama-xxx -c customer-chat-llama# Look for: OOM errors, CUDA errors, model loading failures # Proxy/SideCar kubectl logs customer-chat-llama-xxx -c sidecar-reverse-proxy# Look for: DNS resolution issues, upstream connection failures # Metrics Exporter Sidecar kubectl logs customer-chat-llama-xxx -c otel-collector# Look for: Metrics collection issues, export failures

Isu Solusi Tindakan