[NVIDIA PyProf] FLOPs 측정하기

PyProf란?

NVIDIA에서 만든 profile 툴. PyTorch 모델과 GPU 성능을 profile한다.

1. GitHub로 설치하기

클론합니다

git clone https://github.com/NVIDIA/PyProf.git

PyProf로 들어간 후 PyProf를 설치합니다

pip install .

잘 설치되었는지 확인

pip list | grep pyprof

다음과 같이 보여야 합니다

pyprof            3.10.0

2. PyTorch와 함께 사용하기

Overview

FLOP 및 bandwidth 계산의 경우 비교적 간단한 접근 방식을 사용한다

예를들어 행렬 $AM\times K$와 $BK\times N$의 경우 행렬 곱셈에 대한 FLOP는 $2\times M \times N\times K$, bandwidth는 $M \times K + N \times K + M\times N$로 계산한다.

Components and Flow

1. Import PyProf

2. Profile PyTorch Model: NVProf or Nsight Systems을 통해 모델을 profile한 후 SQL 데이터베이스를 얻는다

3. parse.py: SQL database에서 정보 추출

4. prof.py: 얻은 정보를 통해 flops와 bytes 계산

1. Enable Profiler in PyTorch Network

profiler를 import 합니다

import torch.cuda.profiler as profiler
import pyprof
pyprof.init()

PyTorch’s NVTX context manager와 함께 training/inference loop를 실행시킵니다

iters = 500
iter_to_capture = 100

# Define network, loss function, optimizer etc.

# PyTorch NVTX context manager
with torch.autograd.profiler.emit_nvtx():

    for iter in range(iters):

        if iter == iter_to_capture:
            profiler.start()

        output = net(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()

        if iter == iter_to_capture:
            profiler.stop()

2. Enable Profiler in PyTorch Network

Pytorch 스크립트를 import pyprof를 추가해 수정한 다음 NVProf나 Nsight Systems를 통해 성능을 측정합니다. 두 profiler모두 profile한 내용을 담은 SQLite database를 결과로 냅니다.

Nsight를 사용하도록 합시다

위에서 profiler.start()와 profiler.stop()를 썼으니 -c cudaProfilerApi --stop-on-range-end true를 붙여야합니다.

nsys profile -f true -o net -c cudaProfilerApi --stop-on-range-end true --export sqlite python net.py

3. Pasrse the SQL file

만든 sqlite파일을 dict파일로 변경합니다

python -m pyprof.parse net.sqlite > net.dict

4. Run the Prof Script

만든 dict파일을 통해 csv파일을 만듭니다

여러 옵션을 선택할 수 있습니다

python -m pyprof.prof --csv -c idx,dir,kernel,params,flops net.dict > net.csv

Options for prof.py

Command	Description
file	Input file for prof.py. Generated by parse.py
c	See column option table below
csv	Print a csv output. Exclusively use –csv or -w
w	Width of columnated output. Exclusively use –csv or -w

Column Options

Option	Description
idx	Index
seq	PyTorch Sequence Id
altseq	PyTorch Alternate Sequence Id
tid	Thread Id
layer	User annotated NVTX string (can be nested)
trace	Function Call Trace
dir	Direction
sub	Sub Sequence Id
mod	Module
op	Operation
kernel	Kernel Name
params	Parameters
sil	Silicon Time (in ns)
tc	Tensor Core Usage
device	GPU Device Id
stream	Stream Id
grid	Grid Dimensions
block	Block Dimensions
flops	Floating point ops (FMA = 2 FLOPs)
bytes	Number of bytes in and out of DRAM

The default options are “idx,dir,sub,mod,op,kernel,params,sil”.

자세한 내용은 공식 문서에서 확인!

https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html

NVIDIA PyProf - Pytorch Profiler — NVIDIA PyProf 3.10.0 documentation

docs.nvidia.com

저작자표시 (새창열림)

'기타 정보' 카테고리의 다른 글

[Tensorboard] 서버 텐서보드 로컬 피시로 보기 (0)	2023.03.16
[JetBrains Gateway] JetBrains IDE ssh 연결 및 사용 (Easy) (0)	2023.03.03
Super-Resolution Datasets (0)	2022.09.19
[Ubuntu] 아나콘다 설치하기 (0)	2022.09.15
[Data Augmentation] 유용한 깃허브 (0)	2022.08.02

뀰 블로그

[NVIDIA PyProf] FLOPs 측정하기

PyProf란?

1. GitHub로 설치하기

2. PyTorch와 함께 사용하기

'기타 정보' 카테고리의 다른 글

댓글

티스토리툴바

[NVIDIA PyProf] FLOPs 측정하기

PyProf란?

1. GitHub로 설치하기

2. PyTorch와 함께 사용하기

'기타 정보' 카테고리의 다른 글

관련글

댓글

티스토리툴바