DeepSeek 모델

DeepSeek의 R1 모델은 Invoke API(InvokeModel, InvokeModelWithResponseStream) 및 Converse API(Converse 및 ConverseStream)를 통해 추론하는 데 사용할 수 있는 text-to-text 모델입니다.

DeepSeek의 모델을 사용하여 추론을 호출할 때는 모델에 대한 프롬프트를 포함해야 합니다. Amazon Bedrock이 지원하는 DeepSeek 모델에 대한 프롬프트 생성에 대한 일반적인 내용은 DeepSeek 프롬프트 가이드를 참조하세요.

참고

Amazon Titan, Amazon Nova, DeepSeek-R1, Mistral AI 및 Meta Llama 3 Instruct 모델에서는 요청 액세스를 제거할 수 없습니다. IAM 정책을 사용하고 모델 ID를 지정하여 사용자가 이러한 모델에 추론 직접 호출을 수행하지 못하도록 할 수 있습니다. 자세한 내용은 파운데이션 모델 추론에 대한 액세스 거부를 참조하세요.

이 섹션에서는 DeepSeek 모델에 대한 요청 파라미터 및 응답 필드에 대해 설명합니다. 이 정보를 사용하여 InvokeModel 작업을 사용하여 DeepSeek 모델에 대한 추론 호출을 수행합니다. 이 단원에는 DeepSeek 모델을 호출하는 방법을 보여주는 Python 코드 예제도 포함되어 있습니다.

추론 작업에서 모델을 사용하려면 해당 모델의 모델 ID가 필요합니다. 이 모델은 리전 간 추론을 통해 호출되므로 추론 프로파일 ID를 모델 ID로 사용해야 합니다. 예를 들어 미국의 경우를 사용합니다us.deepseek.r1-v1:0.

모델 이름: DeepSeek-R1
텍스트 모델

APIs에서 DeepSeek 모델을 사용하는 방법에 대한 자세한 내용은 DeepSeek 모델을 참조하세요.

DeepSeek 요청 및 응답

요청 본문

DeepSeek 에는 텍스트 완성 추론 호출에 대한 다음과 같은 추론 파라미터가 있습니다.


{
    "prompt": string,
    "temperature": float, 
    "top_p": float,
    "max_tokens": int,
    "stop": string array
}

필드:

prompt - (문자열) 프롬프트의 필수 텍스트 입력입니다.
temperature – (float) 1 이하의 숫자 값입니다.
top_p – (float) 1 이하의 숫자 값입니다.
max_tokens – (int) 사용된 토큰, 최소 1~최대 32,768개의 토큰.
stop – (문자열 배열) 최대 10개 항목.

응답 본문

DeepSeek 에는 텍스트 완성 추론 호출에 대한 다음과 같은 응답 파라미터가 있습니다. 이 예제는의 텍스트 완성DeepSeek이며 콘텐츠 추론 블록을 반환하지 않습니다.


{
    "choices": [
        {
            "text": string,
            "stop_reason": string
        }
    ]
}

필드:

stop_reason – (문자열) 응답이 텍스트 생성을 중지한 이유입니다. stop 또는의 값입니다length.
stop - (문자열) 모델이 입력 프롬프트에 대한 텍스트 생성을 완료했습니다.
length – (string) 생성된 텍스트의 토큰 길이가에 대한 호출max_tokens의 값을 초과합니다InvokeModel(출력을 스트리밍하는 InvokeModelWithResponseStream경우 또는 ). 응답은 로 잘립니다max_tokens. 의 값을 늘리max_tokens고 요청을 다시 시도하십시오.

예제 코드

이 예제에서는 모델을 호출하는 방법을 보여줍니다.


# Use the API to send a text message to DeepSeek-R1.

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS 리전 of your choice.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the cross Region inference profile ID for DeepSeek-R1
model_id = "us.deepseek.r1-v1:0"

# Define the prompt for the model.
prompt = "Describe the purpose of a 'hello world' program in one line."

# Embed the prompt in DeepSeek-R1's instruction format.
formatted_prompt = f"""
<｜begin▁of▁sentence｜><｜User｜>{prompt}<｜Assistant｜><think>\n
"""

body = json.dumps({
    "prompt": formatted_prompt,
    "max_tokens": 512,
    "temperature": 0.5,
    "top_p": 0.9,
})

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=body)

    # Read the response body.
    model_response = json.loads(response["body"].read())
    
    # Extract choices.
    choices = model_response["choices"]
    
    # Print choices.
    for index, choice in enumerate(choices):
        print(f"Choice {index + 1}\n----------")
        print(f"Text:\n{choice['text']}\n")
        print(f"Stop reason: {choice['stop_reason']}\n")
except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

Converse

요청 본문 -이 요청 본문 예제를 사용하여 ConverseAPI를 호출합니다.


{
    "modelId": string, # us.deepseek.r1-v1:0
    "system": [
        {
            "text": string
        }
    ],
    "messages": [
        {
            "role": string,
            "content": [
                {
                    "text": string
                }
            ]
        }
    ],
    "inferenceConfig": {
        "temperature": float,
        "topP": float,
        "maxTokens": int,
        "stopSequences": string array
    },
    "guardrailConfig": { 
        "guardrailIdentifier":"string",
        "guardrailVersion": "string",
        "trace": "string"
    }
}

필드:

system – (선택 사항) 요청에 대한 시스템 프롬프트입니다.
messages - (필수) 입력 메시지입니다.
- role - 대화 턴의 역할입니다. 유효 값은 user및 assistant입니다.
- content – (필수) 객체 배열로서 대화 턴의 콘텐츠입니다. 각 객체에는 다음 값 중 하나를 지정할 수 있는 형식 필드가 포함되어 있습니다.
  - text – (필수)이 유형을 지정하는 경우 텍스트 필드를 포함하고 텍스트 프롬프트를 값으로 지정해야 합니다.
inferenceConfig
- temperature – (선택 사항) 값: minimum = 0. maximum = 1.
- topP – (선택 사항) 값: minimum = 0. maximum = 1.
- maxTokens – (선택 사항) 중지하기 전에 생성할 최대 토큰 수입니다. 값: minimum = 0. maximum = 32,768.
- stopSequences – (선택 사항) 모델이 출력 생성을 중지하는 사용자 지정 텍스트 시퀀스입니다. 최대 = 항목 10개.

응답 본문 -이 요청 본문 예제를 사용하여 ConverseAPI를 호출합니다.


{
    "message": {
        "role" : "assistant",
        "content": [
            {
                "text": string
            },
            {
                "reasoningContent": {
                    "reasoningText": string
                }
            }
        ],
    },
    "stopReason": string,
    "usage": {
        "inputTokens": int,
        "outputTokens": int,
        "totalTokens": int
    }
    "metrics": {
        "latencyMs": int
    }
}

필드:

message - 모델의 반환 응답입니다.
role - 생성된 메시지의 대화 역할입니다. 이 값은 항상 assistant입니다.
content - 모델에서 생성된 콘텐츠로, 배열로 반환됩니다. 콘텐츠에는 두 가지 유형이 있습니다.
- text - 응답의 텍스트 콘텐츠입니다.
- reasoningContent – (선택 사항) 모델 응답의 추론 콘텐츠입니다.
  - reasoningText - 모델 응답의 추론 텍스트입니다.
stopReason - 모델이 응답 생성을 중지한 이유입니다.
- end_turn - 모델이 중지 지점에 도달한 회전입니다.
- max_tokens - 생성된 텍스트가 maxTokens 입력 필드 값을 초과하거나 모델이 지원하는 최대 토큰 수를 초과했습니다.

예제 코드 - 다음은 ConverseAPI를 호출하기 위해를 만드는 DeepSeek의 예제입니다.


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use the Converse API with DeepSeek-R1 (on demand).
"""

import logging
import boto3

from botocore.client import Config
from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_conversation(bedrock_client,
                          model_id,
                          system_prompts,
                          messages):
    """
    Sends messages to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        system_prompts (JSON) : The system prompts for the model to use.
        messages (JSON) : The messages to send to the model.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Inference parameters to use.
    temperature = 0.5
    max_tokens = 4096

    # Base inference parameters to use.
    inference_config = {
        "temperature": temperature,
        "maxTokens": max_tokens,
    }

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
    )

    # Log token usage.
    token_usage = response['usage']
    logger.info("Input tokens: %s", token_usage['inputTokens'])
    logger.info("Output tokens: %s", token_usage['outputTokens'])
    logger.info("Total tokens: %s", token_usage['totalTokens'])
    logger.info("Stop reason: %s", response['stopReason'])

    return response

def main():
    """
    Entrypoint for DeepSeek-R1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "us.deepseek.r1-v1:0"

    # Setup the system prompts and messages to send to the model.
    system_prompts = [{"text": "You are an app that creates playlists for a radio station that plays rock and pop music. Only return song names and the artist."}]
    message_1 = {
        "role": "user",
        "content": [{"text": "Create a list of 3 pop songs."}]
    }
    message_2 = {
        "role": "user",
        "content": [{"text": "Make sure the songs are by artists from the United Kingdom."}]
    }
    messages = []

    try:
        # Configure timeout for long responses if needed
        custom_config = Config(connect_timeout=840, read_timeout=840)
        bedrock_client = boto3.client(service_name='bedrock-runtime', config=custom_config)

        # Start the conversation with the 1st message.
        messages.append(message_1)
        response = generate_conversation(
            bedrock_client, model_id, system_prompts, messages)

        # Add the response message to the conversation.
        output_message = response['output']['message']
        
        # Remove reasoning content from the response
        output_contents = []
        for content in output_message["content"]:
            if content.get("reasoningContent"):
                continue
            else:
                output_contents.append(content)
        output_message["content"] = output_contents
        
        messages.append(output_message)

        # Continue the conversation with the 2nd message.
        messages.append(message_2)
        response = generate_conversation(
            bedrock_client, model_id, system_prompts, messages)

        output_message = response['output']['message']
        messages.append(output_message)

        # Show the complete conversation.
        for message in messages:
            print(f"Role: {message['role']}")
            for content in message['content']:
                if content.get("text"):
                    print(f"Text: {content['text']}")
                if content.get("reasoningContent"):
                    reasoning_content = content['reasoningContent']
                    reasoning_text = reasoning_content.get('reasoningText', {})
                    print()
                    print(f"Reasoning Text: {reasoning_text.get('text')}")
            print()

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

CohereCommand R 및 Command R+ 모델

AI21 Labs 모델