기계 번역으로 제공되는 번역입니다. 제공된 번역과 원본 영어의 내용이 상충하는 경우에는 영어 버전이 우선합니다. # SageMaker Profiler로 훈련 작업 준비 및 실행 SageMaker Profiler를 사용하여 훈련 작업을 실행하도록 설정하는 단계는 훈련 스크립트 조정 및 SageMaker 훈련 작업 런처 구성의 두 단계로 이루어집니다. **Topics** + [1단계: SageMaker Profiler Python 모듈을 사용하여 훈련 스크립트 조정](#profiler-prepare-training-script) + [2단계: SageMaker AI 프레임워크 예측기 생성 및 SageMaker Profiler 활성화](#profiler-profilerconfig) + [(선택 사항) SageMaker Profiler Python 패키지 설치](#profiler-install-python-package) ## 1단계: SageMaker Profiler Python 모듈을 사용하여 훈련 스크립트 조정 훈련 작업이 실행되는 동안 GPU에서 커널 실행을 캡처하기 시작하려면 SageMaker Profiler Python 모듈을 사용하여 훈련 스크립트를 수정하세요. 라이브러리를 가져오고 `start_profiling()` 및 `stop_profiling()` 메서드를 추가하여 프로파일링의 시작과 끝을 정의합니다. 또한 선택적 사용자 지정 주석을 사용하여 훈련 스크립트에 마커를 추가하여 각 단계에서 특정 작업을 수행하는 동안 하드웨어 활동을 시각화할 수 있습니다. 참고로 주석자는 GPU에서 연산을 추출합니다. CPU에서 프로파일링 작업을 수행할 때는 주석을 추가할 필요가 없습니다. CPU 프로파일링은 프로파일링 구성을 지정할 때도 활성화됩니다. 이 설정은 [2단계: SageMaker AI 프레임워크 예측기 생성 및 SageMaker Profiler 활성화](#profiler-profilerconfig)에서 연습해 보겠습니다. **참고** 전체 훈련 작업을 프로파일링하는 것이 리소스를 가장 효율적으로 사용하는 것은 아닙니다. 훈련 작업의 최대 300단계를 프로파일링하는 것이 좋습니다. **중요** 의 릴리스에는 중단되는 변경 사항이 [2023년 12월 14일](profiler-release-notes.md#profiler-release-notes-20231214) 포함됩니다. SageMaker Profiler Python 패키지 이름이 `smppy`에서 `smprof`로 변경됩니다. 이는 TensorFlow v2.12 이상용 [SageMaker AI 프레임워크 컨테이너](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-containers-sm-support-only)에서 유효합니다. TensorFlow v2.11.0과 같은 [SageMaker AI 프레임워크 컨테이너](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-containers-sm-support-only)의 이전 버전 중 하나를 사용하는 경우에도 SageMaker Profiler Python 패키지는 `smppy`로 계속 사용할 수 있습니다. 어떤 버전이나 패키지 이름을 사용해야 하는지 확실하지 않은 경우 SageMaker Profiler 패키지의 가져오기 문을 다음 코드 코드 코드 조각으로 바꿉니다. ``` try: import smprof except ImportError: # backward-compatability for TF 2.11 and PT 1.13.1 images import smppy as smprof ``` **접근법 1.** 컨텍스트 관리자 `smprof.annotate`을(를) 사용하여 전체 함수에 주석 달기 `smprof.annotate()` 컨텍스트 관리자로 전체 함수를 래핑할 수 있습니다. 코드 라인 대신 함수별로 프로파일링하려는 경우 이 래퍼를 사용하는 것이 좋습니다. 다음 예제 스크립트는 컨텍스트 관리자를 구현하여 각 반복에서 훈련 루프와 전체 함수를 래핑하는 방법을 보여줍니다. ``` import smprof SMProf = smprof.SMProfiler.instance() config = smprof.Config() config.profiler = { "EnableCuda": "1", } SMProf.configure(config) SMProf.start_profiling() for epoch in range(args.epochs): if world_size > 1: sampler.set_epoch(epoch) tstart = time.perf_counter() for i, data in enumerate(trainloader, 0): with smprof.annotate("step_"+str(i)): inputs, labels = data inputs = inputs.to("cuda", non_blocking=True) labels = labels.to("cuda", non_blocking=True) optimizer.zero_grad() with smprof.annotate("Forward"): outputs = net(inputs) with smprof.annotate("Loss"): loss = criterion(outputs, labels) with smprof.annotate("Backward"): loss.backward() with smprof.annotate("Optimizer"): optimizer.step() SMProf.stop_profiling() ``` **접근법 2.** `smprof.annotation_begin()` 및 `smprof.annotation_end()`을(를) 사용하여 함수의 특정 코드 라인에 주석 달기 주석을 정의하여 특정 코드 라인을 프로파일링할 수도 있습니다. 프로파일링의 정확한 시작점과 끝점은 함수가 아닌 개별 코드 라인 수준에서 설정할 수 있습니다. 예를 들어, 다음 스크립트에서 `step_annotator`은(는) 각 반복 시작 시 정의되고 반복 종료 시 종료됩니다. 한편, 각 작업에 대한 기타 세부 주석자가 정의되고 각 반복 전반에 걸쳐 대상 작업을 래핑합니다. ``` import smprof SMProf = smprof.SMProfiler.instance() config = smprof.Config() config.profiler = { "EnableCuda": "1", } SMProf.configure(config) SMProf.start_profiling() for epoch in range(args.epochs): if world_size > 1: sampler.set_epoch(epoch) tstart = time.perf_counter() for i, data in enumerate(trainloader, 0): step_annotator = smprof.annotation_begin("step_" + str(i)) inputs, labels = data inputs = inputs.to("cuda", non_blocking=True) labels = labels.to("cuda", non_blocking=True) optimizer.zero_grad() forward_annotator = smprof.annotation_begin("Forward") outputs = net(inputs) smprof.annotation_end(forward_annotator) loss_annotator = smprof.annotation_begin("Loss") loss = criterion(outputs, labels) smprof.annotation_end(loss_annotator) backward_annotator = smprof.annotation_begin("Backward") loss.backward() smprof.annotation_end(backward_annotator) optimizer_annotator = smprof.annotation_begin("Optimizer") optimizer.step() smprof.annotation_end(optimizer_annotator) smprof.annotation_end(step_annotator) SMProf.stop_profiling() ``` 프로파일러 시작 모듈에 주석을 달고 설정한 후 다음 2단계에서 SageMaker 훈련 작업 런처를 사용하여 제출할 스크립트를 저장합니다. 샘플 런처는 훈련 스크립트에 이름이 `train_with_profiler_demo.py`(으)로 지정된 것으로 가정합니다. ## 2단계: SageMaker AI 프레임워크 예측기 생성 및 SageMaker Profiler 활성화 다음 절차는 SageMaker Python SDK를 사용한 훈련을 위해 SageMaker AI 프레임워크 예측기를 준비하는 방법을 보여줍니다. 1. 다음과 같이 `ProfilerConfig` 및 `Profiler` 모듈을 사용하여 `profiler_config` 객체를 설정합니다. ``` from sagemaker import ProfilerConfig, Profiler profiler_config = ProfilerConfig( profile_params = Profiler(cpu_profiling_duration=3600) ) ``` 다음은 `Profiler` 모듈과 해당 인수에 대한 설명입니다. + `Profiler`: 훈련 작업과 함께 SageMaker Profiler를 활성화하기 위한 모듈입니다. + `cpu_profiling_duration` (int): CPU에서 프로파일링할 시간을 초 단위로 지정합니다. 기본값은 3600초입니다. 1. 이전 단계에서 생성된 `profiler_config` 객체를 사용하여 SageMaker AI 프레임워크 예측기를 생성합니다. 다음 코드는 PyTorch 예측기를 생성하는 예제를 보여줍니다. TensorFlow 예측기를 만들려면 대신 `sagemaker.tensorflow.TensorFlow`를 가져와서 SageMaker Profiler에서 지원하는 [TensorFlow 버전](profiler-support.md#profiler-support-frameworks-tensorflow) 중 하나를 지정하세요. 지원되는 프레임워크 및 인스턴스 유형에 대한 자세한 내용은 [SageMaker Profiler에 사전 설치된 SageMaker AI 프레임워크 이미지](profiler-support.md#profiler-support-frameworks)을(를) 참조하세요. ``` import sagemaker from sagemaker.pytorch import PyTorch estimator = PyTorch( framework_version="2.0.0", role=sagemaker.get_execution_role(), entry_point="train_with_profiler_demo.py", # your training job entry point source_dir=source_dir, # source directory for your training script output_path=output_path, base_job_name="sagemaker-profiler-demo", hyperparameters=hyperparameters, # if any instance_count=1, # Recommended to test with < 8 instance_type=ml.p4d.24xlarge, profiler_config=profiler_config ) ``` 1. `fit` 메서드를 실행하여 훈련 작업을 시작합니다. `wait=False`을(를) 사용하면 훈련 작업 로그를 무음으로 설정하고 백그라운드에서 실행되도록 할 수 있습니다. ``` estimator.fit(wait=False) ``` 훈련 작업을 실행하는 동안 또는 작업이 완료된 후에는 [SageMaker Profiler UI 애플리케이션 열기](profiler-access-smprofiler-ui.md)에서 다음 주제로 이동하여 저장된 프로필을 탐색하고 시각화할 수 있습니다. Amazon S3 버킷에 저장된 프로필 데이터에 직접 액세스하려면 다음 스크립트를 사용하여 S3 URI를 검색하세요. ``` import os # This is an ad-hoc function to get the S3 URI # to where the profile output data is saved def get_detailed_profiler_output_uri(estimator): config_name = None for processing in estimator.profiler_rule_configs: params = processing.get("RuleParameters", dict()) rule = config_name = params.get("rule_to_invoke", "") if rule == "DetailedProfilerProcessing": config_name = processing.get("RuleConfigurationName") break return os.path.join( estimator.output_path, estimator.latest_training_job.name, "rule-output", config_name, ) print( f"Profiler output S3 bucket: ", get_detailed_profiler_output_uri(estimator) ) ``` ## (선택 사항) SageMaker Profiler Python 패키지 설치 에 나열되지 않은 PyTorch 또는 TensorFlow 프레임워크 이미지에서 [SageMaker Profiler에 사전 설치된 SageMaker AI 프레임워크 이미지](profiler-support.md#profiler-support-frameworks)SageMaker Profiler를 사용하거나 훈련을 위해 사용자 지정 Docker 컨테이너에서 SageMaker Profiler를 사용하려면 [SageMaker Profiler Python 패키지 바이너리 파일](profiler-support.md#profiler-python-package) 중 하나를 사용하여 SageMaker Profiler를 설치할 수 있습니다. **옵션 1: 훈련 작업을 시작하는 동안 SageMaker Profiler 패키지 설치** [SageMaker Profiler에 사전 설치된 SageMaker AI 프레임워크 이미지](profiler-support.md#profiler-support-frameworks)에 나열되지 않은 PyTorch 또는 TensorFlow 이미지를 사용하는 훈련 작업에 SageMaker Profiler를 사용하려면 `requirements.txt` 파일을 생성하고 [2단계](#profiler-profilerconfig)에서 SageMaker AI 프레임워크 예측기의 `source_dir` 파라미터에 지정한 경로 아래에 배치합니다. 일반적인 `requirements.txt` 파일 설정에 대한 자세한 내용은 *SageMaker Python SDK 설명서*의 [타사 라이브러리 사용](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#using-third-party-libraries)을 참조하세요. `requirements.txt` 파일에 [SageMaker Profiler Python 패키지 바이너리 파일](profiler-support.md#profiler-python-package)에 대한 S3 버킷 경로 중 하나를 추가합니다. ``` # requirements.txt https://smppy.s3.amazonaws.com/tensorflow/cu112/smprof-0.3.332-cp39-cp39-linux_x86_64.whl ``` **옵션 2: 사용자 지정 Docker 컨테이너에 SageMaker Profiler 패키지 설치** 훈련에 사용자 지정 Docker 컨테이너를 사용하는 경우 Dockerfile에 [SageMaker Profiler Python 패키지 바이너리 파일](profiler-support.md#profiler-python-package)를 추가합니다. ``` # Install the smprof package version compatible with your CUDA version RUN pip install https://smppy.s3.amazonaws.com/tensorflow/cu112/smprof-0.3.332-cp39-cp39-linux_x86_64.whl ``` 일반적으로 SageMaker AI에서 훈련을 위한 사용자 지정 Docker 컨테이너를 실행하는 방법에 대한 지침은 [자체 훈련 컨테이너 조정](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html)을 참조하세요.