[2022_CVPR_VideoINR] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

논문 요약

대부분의 고화질 비디오는 저장하는데 많은 자원이 소모되기에 낮은 해상도와 프레임 레이트로 저장된다. 하지만 최근 temporal interpolation과 spatial super-resolution을 통합한 Space-Time Video Super-Resolution (STVSR) 프레임워크를 통해 이 문제를 해결하고 있다. 그러나 대부분의 STVSR은 고정된 업샘플링 비율만 지원하기 때문에 제약이 있다. 본 논문은 이에 대한 대응책으로 Video Implicit Neural Representation (VideoINR)을 제안하고 이를 STVSR에 적용하였다. 학습된 INR은 비디오를 임의의 해상도, 프레임 레이트로 디코딩할 수 있다.

1. Paper Bibliography

논문 제목

- VideoINR: Learning video implicit neural representation for continuous space-time super-resolution

저자

- Chen, Zeyuan, et al.

출판 정보 / 학술대회 발표 정보

- Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

2. Problems & Motivations

대부분의 비디오는 저장시 그 해상도와 프레임 레이트가 제한적일 수 밖에 없는데 이는 저장하는데 드는 비용이 매우 크기 때문이다. 이러한 비디오를 다시 사람들에게 보여줄 때 (예를 들어 TV로 다시 송출) 이러한 low resolution / low frame rate 비디오를 다시 high하게 제공할 필요가 있다.

이를 해결할 수 있는 방법 중 하나는 Space-Time Video Super-Resolution (STVSR)이다. 입력으로 들어온 비디오의 Spatial resolution과 frame rate를 동시에 키우는 것이다.

Space
- spatial information
- Video Super-Resolution (VSR)
- 키우는 것: 해상도 - 디스플레이 표현력의 세밀함 정도
Time
- temporal information
- Video Frame Interpolation (VFI)
- 키우는 것: 프레임 레이트 - 디스플레이 장치가 화면 하나의 데이터를 표시하는 속도

하지만 대부분의 STVSR은 고정된 비율(예: 2배, 4배 등)의 SR만 할 수 있었다는 한계점을 가지고 있다.

본 논문은 이와 다르게 임의의 크기의 SR이 가능한 VideoINR을 제안한다.

3. Method

VideoINR의 목표는 비디오에 대한 continuous representation을 찾는 것으로 이 representation은 임의의 space-time coordinate $(x_s, x_t)$ 를 RGB값으로 매핑해준다.

이는 multi-layer perceptrons (MLPs)로 파라미터화되며 다음과 같이 표현할 수 있다.

$s = f(x_s, x_t)$

$f$ : video representation
$x_s$ : 2D spatial coordinate
$x_t$ : temporal coordinate
$s$ : predicted RGB value

3.1 Continuous Spatial Representation

SpatialINR: predict the continuous feature of the query coordinate

- SpatialINR은 입력 공간 좌표를 continuous feature domain으로 만들어준다.

1) 쿼리 공간 좌표 $x_s$ 근처의 feature vector를 샘플링해 $z^*$ 를 얻는다.

2) 쿼리 공간 좌표 $x_s$ 와 $z^*$ 의 공간 좌표 $v^*$ 와의 차이를 계산해 상대위치 정보를 구한다.

3) 1과 2를 concat한다.

3) 함수 $f_s$ 에 넣어 쿼리 공간 좌표 $x_s$ 에 대한 continuous feature를 만든다.

$F_s(x_s) = f_s(z^*, x_s-v^*)$

$F_s$ : continuous feature domain defined by SpatialINR
$z^*$ : feature vector nearest to the query coordinate $x_s$
$v^*$ : spatial coordinate of the feature vector $z^*$
$x_s-v^*$ : relative position information between query coordinate and feature vector

3.2 Continuous Temporal Representation

TemporalINR: generate continuous motion flow of the query coordinate

- TemporalINR은 continuous temporal represetation을 위한 continuous motion flow field를 만든다

- 시공간 좌표 $(x_s,x_t)$ 와 연속된 두개의 입력이미지 $I_0, I_1$ 를 통해 TemporalINR은 이를 continuous motion flow로 만들 수 있다.

$M(x_s,x_t) = f_t(x_s,x_t,I_0,I_1)$

$(x_s, x_t)$ : space-time coordinate
$I_0, I_1$ : two consecutive input frames
$M$ : continuous motion flow field
$f_t$ : function for TemporalINR

- SpatialINR에서 이미 $x_s$ 위치에서의 $I_0, I_1$ 에 대한 정보를 continuous feature형태로 얻었으므로 식을 다시 정의할 수 있다.

$M(x_s,x_t) = f_t(x_t, F_s(x_s))$

$F_s(x_s)$ : feature domain defined by SpatialINR

3.3 Space-Time Continuous Representation

앞서 2개의 continuous representations를 얻었는데 이를 합쳐서 하나의 space-time continuous representation으로 만들어야 한다.

- space-time feature는 feature domain을 warp해서 얻을 수 있다.

- 쿼리 좌표 $x_s$ 를 warp하면 $x'_s$ 가 된다.

$x'_s = x_s + M(x_s,x_t)$

$x'$ : coordinate for continuous feature
$M(x_s,x_t)$ : motion flow vector at $(x_s, x_t)$

- 새로 얻은 좌표 $x'_s$ 를 통해 새로운 continuous 2D feature를 얻을 수 있다.

- 이는 공간 $x_s$ , 시간 $x_t$ 에서의 정보를 모두 가지고 있다.

$F_{st}(x_s,x_t) = F_x(x'_s) = F_s(x_s + M(x_s,x_t) )$

$(x_s,x_t)$ : coordinate for continuous space-time representation
$F_{st}(x_s,x_t)$ : continuous space-time feature

- 실제 구현에서는 양방향의 flows와 warped features를 만들어 concat했다.

3.4 Feature Decoding

마지막으로 features를 RGB 값으로 디코드해야한다.

- 이때 입력 정보를 풍부하게 하기 위해 각각 다른 스케일의 features를 만들고 이를 입력 프레임과 concat하여 디코딩에 사용한다.

4. Experiments

Datasets

Training

- Adobe240

비디오를 subset으로 나눔 (100, 16, 17 / train, val, test)
비디오를 시퀀스로 만들어서 학습. 각 시퀀스는 약 3000 프레임으로 구성
만든 시퀀스는 학습에서 high-resolution이 되고 Matlab의 imresize로 low-resolution을 만듬

Test

- Vid4, Adobe240, GoPro

Results

Google Scholar Link

https://scholar.google.co.kr/scholar?hl=ko&as_sdt=0%2C5&q=VideoINR%3A+Learning+Video+Implicit+Neural+Representation+for+Continuous+Space-Time+Super-Resolution&btnG=

Google 학술 검색

Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-T

scholar.google.co.kr

GitHub

https://github.com/Picsart-AI-Research/VideoINR-Continuous-Space-Time-Super-Resolution

GitHub - Picsart-AI-Research/VideoINR-Continuous-Space-Time-Super-Resolution: [CVPR 2022] VideoINR: Learning Video Implicit Neur

[CVPR 2022] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution - GitHub - Picsart-AI-Research/VideoINR-Continuous-Space-Time-Super-Resolution: [CVPR ...

github.com

저작자표시

'논문 리뷰 > Super-Resolution' 카테고리의 다른 글

[TDAN] Tdan: Temporally-deformable alignment network for video super-resolution (0)	2022.09.14
[ABPN] Image Super-Resolution via Attention based Back Projection Networks (0)	2022.08.09
[RCAN] Image Super-Resolution Using Very Deep Residual Channel Attention Networks (0)	2022.08.09
[DRLN] Densely Residual Laplacian Super-Resolution (0)	2022.08.09
[TOFlow] Video Enhancement with Task-Oriented Flow (0)	2022.07.18

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

뀰 블로그

[2022_CVPR_VideoINR] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

논문 요약

1. Paper Bibliography

2. Problems & Motivations

3. Method

3.1 Continuous Spatial Representation

3.2 Continuous Temporal Representation

3.3 Space-Time Continuous Representation

3.4 Feature Decoding

4. Experiments

Datasets

Results

'논문 리뷰 > Super-Resolution' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[2022_CVPR_VideoINR] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

논문 요약

1. Paper Bibliography

2. Problems & Motivations

3. Method

3.1 Continuous Spatial Representation

3.2 Continuous Temporal Representation

3.3 Space-Time Continuous Representation

3.4 Feature Decoding

4. Experiments

Datasets

Results

'논문 리뷰 > Super-Resolution' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역