[BasicVSR++] BasicVSR++: Improving video super-resolution with enhanced propagation and alignment

논문 요약

1. Paper Bibliography

논문 제목

- BasicVSR++: Improving video super-resolution with enhanced propagation and alignment

저자

- Chan et al.

출판 정보 / 학술대회 발표 정보

- arXiv preprint arXiv:2104.13371 (2021) / accepted to CVPR 2022

년도

- 2022

2. Problems & Motivations

BasicVSR

- 저자는 이전 논문인 [2]에서 VSR의 파이프라인을 Propagation, Alignment, Aggregation, Upsampling으로 요약 정리하였고 이를 기반으로한 BasicVSR 네트워크를 제시하였다

- BasicVSR에서는 bidirectional propagation을 사용해 비디오 전체에서 정보를 얻었으며 optical flow를 사용해 feature warping을 하였다

- 하지만 이는 기본적인 디자인이기 때문에 정보를 얻는데 한계가 있었다

ex) fine detail, occluded and complex regions

[2] Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. BasicVSR: The search for essential components in video super-resolution and beyond. In CVPR, 2021.

[3] Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Understanding deformable alignment in video super-resolution. In AAAI, 2021.

3. Proposed Solutions

논문에서 제안하는 해결책들 정리

3.1 Second-Order Grid Propagation

1) Grid Propagation: Propagation을 반복

- 중간 features(현재)는 앞, 뒤로 다른 frames의 정보를 재방문하여 feature refinement을 한다

- 이전에는 features를 단 한번만 propagate한 반면, grid propagation은 전체 시퀀스에서 반복적으로 정보를 추출하여 features의 표현력을 향상시킨다

2) Second-order connection

- 정보들이 더 다양한 spatiotemporal location에서 모일 수 있으며 robustness를 강화하여 occluded, fine regions를 향상시킨다

- 위 두 방법을 합쳐서 Second-order grid propagation이라 한다

- 가령 x_i가 image input이면 여러개의 residual blocks를 통해 feature g_i를 추출하게 된다. f_j_i는 j번째 propagation branch에서 계산하는 feature이다.

- feature f_j_i를 계산하기 위해 먼저 f_j_i-1과 f_j_i-2를 flow-guided deformable alignment를 통해 align한다.

- 그 다음 features는 residual blocks를 따라 concat된다

3.2 Flow-Guided Deformable Alignment

- Align에 deformable convolution을 사용한 VSR은 이전에도 있었는데[33, 35] 학습이 어렵고 불안정해서 offset overflow와 최종 성능 감소라는 문제를 발생시키기도 했다

- Offset diversity의 장점을 얻으며 불안정성을 극복하기 위해 저자는 optical flow를 deformable alignment의 가이드로 사용하는 법을 제시하였다. (deformable alignment와 flow-based alignment는 큰 연관성이 있다는 것에서 모티브[3])

- i-th timestep에서 i-th LR 이미지에서 얻은 feature g_i, 이전 timestep에서 얻은 feature f_i-1, optical flow s_i->i-1가 주어진다

- 먼저 f_i-1과 s_i->i-1을 warp한다

- 3에서 warp된 features ^f_i-1은 DCN offsets o_i->i-1을과 modulation masks m_i->i-1을 계산하는데 사용된다. DCN offsets를 바로 계산하는 것에 비해 optical flow의 residue를 계산하는 것이다

- 마지막으로 DCN은 warp되지 않은 feature f_i_1에 적용돼 최종 i-th timestep features ^f_i를 만든다

- 위 식은 single feature만 적용되므로 second-order propagation에는 적용할 수 없다. 위 과정을 2번 반복하는 방법이 있으나 이는 컴퓨팅을 2배로 늘리고 나눠서 align하는 것은 features의 정보를 낭비한다

- 그러므로 두 features를 동시에 align한다. Warp한 features 와 flow를 concat하고 offset을 계산한다

[3] Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Understanding deformable alignment in video super-resolution. In AAAI, 2021.

[33] Hua Wang, Dewei Su, Longcun Jin, and Chuangchuang Liu. Deformable non-local network for video super-resolution. IEEE Access, 2019.

[35] Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. EDVR: Video restoration with enhanced deformable convolutional networks. In CVPRW, 2019.

4. 입력의 형태

- patch size of input frames: 64 x 64

5. 시간적 정보 모델링 프레임워크

기본 프레임워크 (2D CNN, 3D CNN, RNN, etc)

- RNN

구조에 기여한 바가 있다면?

- Second-orde Grid Propagation으로 여러 시점에서 정보를 얻을 수 있는 RNN 구조 제시. Fine detail을 얻을 수 있고 occlusion, complex region에 강하다

6. 프레임 정렬 방식

Implicit (암시적) or Explicit (명시적)

- Explicit

추가 설명

- Optical flow를 구하고 deformable convolution에 이용

7. 업샘플링 방식

- PixelShuffle

8. 그 외

모델 파라미터 개수

- 6.4M

학습 데이터

- REDS, Vimeo-90K

테스트 데이터

Vid4, REDS4, UDM10, Vimeo-90K-T

- 4x downsampling using two degradations - Bicubic(BI), Blur Downsampling(BD)

논문 분석

1. 앞서 정리한 논문들에 대한 비평들 중 해당 논문에서 해결된 바가 있다면 정리

2. 해당 논문에 대한 비평(Critique)

Google Scholar Link

https://scholar.google.co.kr/scholar?hl=ko&as_sdt=0%2C5&q=basicvsr%2B%2B&btnG=

Google 학술 검색

How to efficiently utilize the temporal features is crucial, yet challenging, for video restoration. The temporal features usually contain various noisy and uncorrelated information, and they …

scholar.google.co.kr

GitHub

https://github.com/ckkelvinchan/BasicVSR_PlusPlus

GitHub - ckkelvinchan/BasicVSR_PlusPlus: Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Prop

Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment" - GitHub - ckkelvinchan/BasicVSR_PlusPlus: Official repository of "Basic...

github.com

저작자표시 (새창열림)

'논문 리뷰 > Super-Resolution' 카테고리의 다른 글

[EFENet] EFENet: Reference-Based Video Super-Resolution with Enhanced Flow Estimation (0)	2022.06.24
[DASR] Unsupervised Degradation Representation Learning for Blind Super-Resolution (0)	2022.06.20
[OVSR] Omniscient Video Super-Resolution (0)	2022.04.12
[BasicVSR] BasicVSR: The search for essential components in video super-resolution and beyond (0)	2022.04.12
[RRN] Revisiting temporal modeling for video super-resolution (0)	2022.04.12

뀰 블로그

[BasicVSR++] BasicVSR++: Improving video super-resolution with enhanced propagation and alignment

논문 요약

1. Paper Bibliography

2. Problems & Motivations

3. Proposed Solutions

4. 입력의 형태

5. 시간적 정보 모델링 프레임워크

6. 프레임 정렬 방식

7. 업샘플링 방식

8. 그 외

논문 분석

1. 앞서 정리한 논문들에 대한 비평들 중 해당 논문에서 해결된 바가 있다면 정리

2. 해당 논문에 대한 비평(Critique)

'논문 리뷰 > Super-Resolution' 카테고리의 다른 글

댓글

티스토리툴바

[BasicVSR++] BasicVSR++: Improving video super-resolution with enhanced propagation and alignment

논문 요약

1. Paper Bibliography

2. Problems & Motivations

3. Proposed Solutions

4. 입력의 형태

5. 시간적 정보 모델링 프레임워크

6. 프레임 정렬 방식

7. 업샘플링 방식

8. 그 외

논문 분석

1. 앞서 정리한 논문들에 대한 비평들 중 해당 논문에서 해결된 바가 있다면 정리

2. 해당 논문에 대한 비평(Critique)

'논문 리뷰 > Super-Resolution' 카테고리의 다른 글

관련글

댓글

티스토리툴바