1 ์ƒ์„ฑ ๋‚œ์ด๋„๊ฐ€ ๋†’์€ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ 256x256 ํ•ด์ƒ๋„์˜ ๊ฒฐ๊ณผ๋ฌผ, StyleGAN2 ๊ฒฐ๊ณผ๋ฌผ๊ณผ ์œ ์‚ฌํ•œ ์ˆ˜์ค€์„ ๋ณด์˜€๋‹ค๊ณ  ํ•œ๋‹ค.

CIPS์˜ ๊ถ๊ทน์ ์ธ ๋ชฉํ‘œ๋Š” ๊ฐ ํ”ฝ์…€์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค.

๊ทธ๋ฅผ ์œ„ํ•ด์„œ Conv๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ํ•„์ˆ˜์ ์ด๋ฉฐ, ๊ทธ๋Ÿผ์—๋„ ๊ณ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด Positional Encoding์„ ์ถ”๊ฐ€ํ•˜์—ฌ SoTA๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค๋Š” ๊ฒƒ์œผ๋กœ ์š”์•ฝํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

Paper: https://arxiv.org/abs/2011.13775

Github: https://github.com/saic-mdal/CIPS

Introduction

CIPS๋Š” Spatial Convolution์ด๋‚˜ Self Attention ์—†์ด MLP๋ฅผ ์‚ฌ์šฉํ•ด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.

์ผ๋ฐ˜์ ์ธ ์ƒ์„ฑ ๋ชจ๋ธ์ด Spatial Convolution์„ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜๊ณ  ์žˆ์Œ์„ ์ƒ๊ฐํ•˜๋ฉด Convolution ์—†์ด SoTA๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์ƒ๊ฐํ•  ์ˆ˜ ์—†์—ˆ์ง€๋งŒ CIPS๋Š” LSUN Church ๋“ฑ์—์„œ SoTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ CVPR 2021์—์„œ Oral ๋ฐœํ‘œํ–ˆ๋‹ค.

2

CIPS๋Š” ๊ณต๊ฐ„์  ์ œ์•ฝ์„ ๊ฐ€์ง€๋Š” Spatial Convolution์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋Œ€์‹ , ํ”ฝ์…€์˜ ์ขŒํ‘œ๊ฐ’์„ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ ํ”ฝ์…€์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.

ํ”ฝ์…€์˜ ์ขŒํ‘œ๋ฅผ ์ถ”๊ฐ€๋กœ ์ž…๋ ฅ๋ฐ›๋Š” ๊ฒƒ์€ CoordConv์—์„œ Spatial-Relational Bias๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์— ์ ์šฉ๋์ง€๋งŒ CoordConv๋Š” ์—ฌ์ „ํžˆ Spatial Convolution์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ณ  ์ฃผ๋ณ€์˜ ํ”ฝ์…€ ์ •๋ณด๋ฅผ ์•Œ ์ˆ˜๋ฐ–์— ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ข…์†์ ์ด๋ผ๋Š” ์ ์—์„œ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.

๊ฐ ํ”ฝ์…€์ด ๋…๋ฆฝ์ ์ด๋ผ๋Š” ๊ฒƒ์€ ๋งŽ์€ ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ ์˜ˆ๋ฅผ ๋“ค์–ด ์›ํ†ตํ˜• ํŒŒ๋…ธ๋ผ๋งˆ ๊ฐ™์€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋“ฑ์˜ ํ™•์žฅ์„ฑ์„ ๊ฐ–๊ณ  ์žˆ๊ณ , ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ œํ•œ๋œ ํ•˜๋“œ์›จ์–ด์—์„œ๋„ ์ˆœ์ฐจ์  ํ•ฉ์„ฑ์„ ํ†ตํ•ด ๋ฌด๋ฆฌ ์—†์ด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

์œ„ Figure ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€๋กœ๋กœ 1/4, 3/4 ์ง€์ ์„ ๋ณด๋ฉด ์ƒ๋‹นํžˆ ์ขŒ์šฐ๋Œ€์นญ์œผ๋กœ ๋ณด์ด๊ธด ํ•˜์ง€๋งŒ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ์—๋„ ๋งŽ์€ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ ์ค‘์ธ ๋งŒํผ ์ฃผ๋ชฉํ• ๋งŒํ•œ ๊ฒฐ๊ณผ๋ฌผ์ด๋‹ค.

Method

๊ทธ๋ ‡๋‹ค๋ฉด ํ”ฝ์…€์˜ ์ขŒํ‘œ๋ฅผ ์ž…๋ ฅํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์–ด๋–ค ๋ฐฉ๋ฒ•์ด ์ข‹์„๊นŒ?

์ตœ๊ทผ์˜ NeRF๋ฅผ ๋ณด๋ฉด MLP์˜ ์ž…๋ ฅ์œผ๋กœ ์ด๋ฏธ์ง€ ํ”ฝ์…€ ์ขŒํ‘œ๋ฅผ ์ธ์ฝ”๋”ฉ ํ•˜๋Š”๋ฐ์— Fourier feature๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณด๋ฉด Fourier feature๋ฅผ ์“ฐ๋Š” ๊ฒƒ์ด ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ์ด Fourier feature๋ฅผ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ถ„์•ผ์— ์ ์šฉํ•œ ์‚ฌ๋ก€๊ฐ€ ์—†์—ˆ์œผ๋ฉฐ CIPS๊ฐ€ ๊ทธ๊ฒƒ์„ ์ ์šฉํ•œ ์‚ฌ๋ก€๋กœ ๋ณด๋ฉด ๋˜๊ฒ ๋‹ค.

3

CIPS Generator ๊ตฌ์กฐ, ํ”ฝ์…€ ์ขŒํ‘œ (x, y)๊ฐ€ ์ธ์ฝ”๋”ฉ ๋˜๊ณ  Weight Modulated MLP๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋ฏธ์ง€์—๋Š” ๋น ์ ธ์žˆ์ง€๋งŒ, Skip Connection๋„ ์กด์žฌํ•œ๋‹ค. Generator๋Š” HxW์˜ ์ •ํ•ด์ง„ ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ, ๊ฐ ์ขŒํ‘œ์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ํ”ฝ์…€ ๊ฐ’์„ Regression ํ•œ๋‹ค. ์ •๋ง๋กœ ํ”ฝ์…€ ํ•˜๋‚˜ํ•˜๋‚˜ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค. Generator ๊ตฌ์กฐ๋Š” StyleGAN2๊ฐ€ Baseline์ด๋ฉฐ Conv๋Œ€์‹  MLP๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ์ƒ์ˆ˜๊ฐ’ ๋Œ€์‹  ์ขŒํ‘œ ์ธ์ฝ”๋”ฉ(Positional Encoding)์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด ์ฐจ์ด์ ์ด๋‹ค.

์ด๋ฏธ์ง€ ์ƒ์„ฑ ์‹œ, ๋žœ๋ค ๋ฒกํ„ฐ z๋Š” ๋ชจ๋“  ํ”ฝ์…€์—์„œ ๊ณต์œ ํ•˜๋ฉฐ (x, y)๊ฐ€ ํ”ฝ์…€ ์ขŒํ‘œ์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋ฉฐ ์ž…๋ ฅ๋˜์–ด ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค.

ModFC์— ๋Œ€ํ•œ ์„ค๋ช…์€ StyleGAN2์˜ ModConv์™€ ๋™์ผํ•˜๋‹ค.

ModFC๋ฅผ ์–ด๋–ป๊ฒŒ ์ ์šฉํ–ˆ๋Š”์ง€ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ModFC๋Š” StyleGAN2์˜ ModConv์™€ ๊ทธ ๊ฐœ๋…์ด ๋™์ผํ•œ๋ฐ ๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•˜์ž๋ฉด, Figure 2์—์„œ FC์˜ weight Bฬ‚ ์„ w๋ฅผ ํ†ตํ•ด Modulation ํ•˜๋Š” ๊ฒƒ์ธ๋ฐ ์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

s = style vector w๋ฅผ A๋ฅผ ํ†ตํ•ด mapping ํ•œ ๊ฒฐ๊ณผ๋ฌผ

ฯต = ๋ถ„๋ชจ๊ฐ€ 0์ด ๋˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ์•„์ฃผ ์ž‘์€ ๊ฐ’

์œ„ ์ˆ˜์‹์„ ํ†ตํ•ด ๊ธฐ๋ณธ weight B๋ฅผ Bฬ‚ ๋กœ mapping ํ•œ๋‹ค. ์—ฌ๊ธฐ์— ModFC ๋ ˆ์ด์–ด 2๊ฐœ๋งˆ๋‹ค skip connection์„ ์ฃผ์—ˆ๋‹ค.

StyleGAN2์™€์˜ ์ฐจ์ด์ ์ธ Positional Encoding์„ ์ฃผ๋Š” ๋ถ€๋ถ„์„ ์‚ดํŽด๋ณด์ž.

๋จผ์ € MLP์— Positional Encoding์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ๋Š” SIREN(Implicit Neural Representations with Periodic Activation Functions),

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains์˜ 2๊ฐœ์˜ ๋…ผ๋ฌธ์—์„œ ์ฐพ์•„๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ „์ž๋Š” ๋ชจ๋“  ๋ ˆ์ด์–ด์— ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐ ์‚ฌ์ธํŒŒ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ ๊ฒƒ์ด๊ณ  ํ›„์ž๋Š” ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด์—๋งŒ ํ™œ์„ฑํ™”ํ•จ์ˆ˜๋กœ ์ฃผ๊ธฐํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด๋‹ค. CIPS๋Š” ์ด ๋‘˜์„ ์„ž์€ ํ˜•ํƒœ๋กœ ์ฒซ๋ฒˆ์งธ ๋ ˆ์ด์–ด์—๋งŒ ์‚ฌ์ธํŒŒ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

Fourier features =efo(x,y)=sin[Bfo(xโ€ฒ,yโ€ฒ)T] ์ž…๋ ฅ์„ Convolution ํ•œ ํ›„, sin์„ ์ทจํ•จ

๊ทธ๋Ÿฐ๋ฐ ์ €์ž๋Š” Fourier features๋งŒ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€์—์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฌผ๊ฒฐ๋ฌด๋Šฌ์˜ artifact๊ฐ€ ๋‚˜ํƒ€๋‚˜์„œ ๊ณ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€๋ฅผ ์–ป์„ ์ˆ˜ ์—†์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๊ทธ๋ž˜์„œ ๊ฐ ์ขŒํ‘œ์— ๋Œ€ํ•œ coordinate embeddings e(x,y)co๋ฅผ ํ•™์Šตํ•œ๋‹ค.

์ด coordinate embeddings๋Š” Constant ๊ฐ’์œผ๋กœ, ํ•™์Šต ๋•Œ์—๋งŒ ๊ฐ’์„ ์กฐ์ •ํ•œ๋‹ค.

StyleGAN2์—์„œ Generator์— 4x4 Constant ์ž…๋ ฅ์„ ์ฃผ๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•˜๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ Positional encoding์€ Fourier features์™€ coordinate embeddings๋ฅผ ๋ถ™์—ฌ์„œ ์‚ฌ์šฉํ•˜๋ฉฐ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Positional encoding =e(x,y)=concat[efo(x,y),e(x,y)co]

Experiments

4

Table 1. 2562ํ•ด์ƒ๋„์—์„œ์˜ FID ๋น„๊ต, StyleGAN2์™€ ๊ฒฌ์ค„๋งŒํ•˜๋‹ค.

4 1

Table 2. Precision์€ ํ–ฅ์ƒ, Recall์€ ์‚ด์ง ๋–จ์–ด์ง€๋Š” ๊ฒฐ๊ณผ

์‹คํ—˜ ๊ฒฐ๊ณผํ‘œ๋ฅผ ๋ณด๋ฉด FID Score๊ฐ€ StyleGAN2์™€ ๊ฒฌ์ค„๋งŒํ•œ ์ˆ˜์ค€์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋‹ค๋งŒ 256 ํ•ด์ƒ๋„์— ๋Œ€ํ•œ ๊ฒƒ์œผ๋กœ 1024์— ๋Œ€ํ•ด์„œ๋Š” ์–ด๋–จ์ง€ ๋ชจ๋ฅด๊ฒ ๋‹ค.

CIPS Module์˜ ํšจ๊ณผ

5

mean style vector ๊ฒฐ๊ณผ๋ฌผ (์ขŒ: CIPS, ์šฐ: CIPS-NE)

6 Table 3. CIPS์—์„œ ๊ฐ ๋ชจ๋“ˆ์˜ ์‚ฌ์šฉ์œ ๋ฌด์— ๋”ฐ๋ฅธ FID

๋‹ค์Œ์€ CIPS์˜ ๊ฐ ๋ชจ๋“ˆ์ด ์ •๋ง ํšจ๊ณผ์ ์ธ์ง€ ์•Œ์•„๋ณด๋Š” ์‹คํ—˜์ด๋‹ค.

Table 3์—์„œ +๊ฐ€ ํ•ด๋‹น ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด๊ณ  -๋Š” ํ•ด๋‹น ๋ชจ๋“ˆ์„ ์ œ์™ธํ•œ ๊ฒƒ์ด๋ฉฐ ๋งˆ์ง€๋ง‰์˜ Sine Activation ํ–‰์€ ์ฒซ ๋ ˆ์ด์–ด๋งŒ ์“ฐ๋Š” ๊ฒƒ์ด -, ๋ชจ๋“  ๋ ˆ์ด์–ด์— ๋‹ค ์‚ฌ์šฉํ•œ ๊ฒƒ์ด +๋‹ค.

์‹คํ—˜์—์„œ Coordinate embedding์˜ ์œ ๋ฌด๊ฐ€ ๊ฐ€์žฅ ํฐ ์„ฑ๋Šฅ ์ฐจ์ด๋ฅผ ๊ฐ€์ ธ์˜ค๋ฉฐ, ์œ„์˜ Figure์„ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋‹ค.

Positional Encoding์˜ ํŠน์ง•

8 Spectrum magnitude, (a)๋ณด๋‹ค (b)์˜ ์ถœ๋ ฅ์ด ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์ด ๋” ๋งŽ์Œ.

7 PCA plot (3 components -> RGB๋กœ ์ด๋ฏธ์ง€ ํ‘œํ˜„), (b)๋Š” ์„ธ๋ถ€๋ฌ˜์‚ฌ์™€ keypoints๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ

9

Left: Original, Center: Coordinate embeddings์„ 0์œผ๋กœ ์ž…๋ ฅ, Right: Fourier features๋ฅผ 0์œผ๋กœ ์ž…๋ ฅ

๋‹ค์Œ์œผ๋กœ Fourier features์™€ Coordinate embeddings์˜ ์ฐจ์ด๋ฅผ ์•Œ์•„๋ณด์ž.

์œ„์˜ ์„ธ Figure๋“ค์„ ๋ณด๋ฉด Coordinate embeddings๊ฐ€ ๋” ์„ธ๋ถ€์ ์ธ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๋‹จ์ˆœํ•˜๊ฒŒ ์ด๋ฅผ ์„ค๋ช…ํ•˜์ž๋ฉด Fourier features๋Š” ์ž…๋ ฅ ์ขŒํ‘œ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋˜๋Š” ๋ฐ˜๋ฉด,

Coordinate embeddings๋Š” Constant๊ฐ’์ด๋ฏ€๋กœ ๊ฐ๊ฐ์˜ ํ”ฝ์…€๊ณผ ๋ฌด๊ด€ํ•˜๊ฒŒ ํ•™์Šต ๋„๋ฉ”์ธ์˜ ๊ณตํ†ต์ ์ธ ํŠน์ง•๋“ค์„ ํ•™์Šตํ•œ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Spectral analysis

10 FFHQ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์˜ Spectral ๋ถ„์„, CIPS-NE๊ฐ€ ์‹ค์ œ ๋ฐ์ดํ„ฐ์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•จ

CIPS๋Š” ํ”ฝ์…€ ์ขŒํ‘œ์— ๋”ฐ๋ผ ๋™์ž‘ํ•˜๊ณ  upscaling์„ ํ•˜์ง€ ์•Š๋Š”๋‹ค.

๊ทธ๋ž˜์„œ Spectral์„ Convolutional Upsampling์„ ์‚ฌ์šฉํ•˜๋Š” StyleGAN2์™€ ๋น„๊ตํ•˜์˜€๋Š”๋ฐ, StyleGAN2์˜ Magnitude Spectrum์„ ๋ณด๋ฉด ๊ณ ์ฃผํŒŒ ์˜์—ญ์— ์ ์ด ์ขŒํ‘œ ์นธ์ฒ˜๋Ÿผ ์ฐํ˜€์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์ง€๋งŒ CIPS๋Š” ๊น”๋”ํ•˜๋‹ค.

๊ทธ๋Ÿฐ๋ฐ CIPS-base๋ณด๋‹ค CIPS-NE๊ฐ€ ๋” ์‹ค์ œ ๋ฐ์ดํ„ฐ์™€ ๊ฐ€๊นŒ์šด ๊ฒƒ์€ FID Score์˜ ์ฐจ์ด๋ฅผ ์ƒ๊ฐํ•˜๋ฉด ์ƒ๋‹นํžˆ ์˜์™ธ๋‹ค.

๋…ผ๋ฌธ์—์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜์ง€๋Š” ์•Š๋Š”๋ฐ ์ €์ž๋„ ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„์„ํ•˜์ง€๋Š” ๋ชปํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

๋‹ค๋งŒ skip connection์ด ์ž์—ฐ์ ์ธ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ ๋ฐฉํ•ด๋˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค๋Š” ๊ฒฐ๋ก ๋งŒ ๋‚จ๊ฒผ๋‹ค.

Foveated rendering

11 Foveated Synthesis, ์ „์ฒด ํ”ฝ์…€ ์ค‘ ์ผ๋ถ€๋งŒ ํ•ฉ์„ฑํ•˜๊ณ  ๋‚˜๋จธ์ง€๋Š” bicubic interpolation์œผ๋กœ ์ฑ„์šฐ๋Š” ๊ฒƒ. (์ขŒ->์šฐ: 5%, 25%, 50%, 100%)

12

์ขŒ : 256^2 ๋กœ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€๋ฅผ Lanczos upsample, ์šฐ : 256^2 ํ•™์Šตํ•œ ๋ชจ๋ธ์— 1024^2 ์ขŒํ‘œ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ์ƒ์„ฑํ•œ ๊ฒƒ.

Foveated rendering์€ ๊ฐ๊ฐ์˜ ํ”ฝ์…€์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค๋Š” ํŠน์ง•์„ ํ™œ์šฉํ•œ ๊ฒƒ์ธ๋ฐ, Figure 8์€ ์ „์ฒด ์ด๋ฏธ์ง€์˜ ์ค‘์‹ฌ์„ ๊ธฐ์ค€์œผ๋กœ 0.4 std gaussian ๋ถ„ํฌ๋กœ ํ”ฝ์…€์„ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ์ƒ์„ฑํ•œ ๊ฒƒ์œผ๋กœ ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ์ƒ์„ฑํ•˜์ง€ ์•Š๊ณ  ์ผ๋ถ€๋งŒ ์ƒ์„ฑํ•˜์—ฌ ์—ฐ์‚ฐ ๋น„์šฉ์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

์‹ค์ œ ์šฐ๋ฆฌ ๋ˆˆ์˜ ๋ง๋ง‰์€ ์ค‘์‹ฌ๋ถ€๋งŒ ์ œ๋Œ€๋กœ ๊ด€์ธกํ•˜๊ณ  ๊ทธ ์™ธ๋ถ€๋Š” ์ œ๋Œ€๋กœ ๊ด€์ธก๋˜์ง€ ์•Š๋Š”๋‹ค. ๋น„์Šทํ•˜๊ฒŒ ๊ฒŒ์ž„์—์„œ๋„ ์šฐ๋ฆฌ์˜ ์‹œ์•ผ์— ๋ณด์ด๋Š” ๋ถ€๋ถ„๋งŒ ๋ณด์—ฌ์ฃผ๊ณ  ๋‚˜๋จธ์ง€๋Š” Rendering์„ ์ƒ๋žตํ•˜์—ฌ ์—ฐ์‚ฐ ๋น„์šฉ์„ ๋‚ฎ์ถ”๋Š” ๊ธฐ์ˆ ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค.

์œ„ Figure ๋Š” ๋‹จ์ˆœํžˆ ์ขŒํ‘œ grid๋ฅผ ์ข€ ๋” ์„ธ๋ฐ€ํ•˜๊ฒŒ sampling ํ•˜์—ฌ ๊ณ ํ•ด์ƒ๋„ ๊ฒฐ๊ณผ๋ฌผ์„ ์ƒ์„ฑํ•œ ๊ฒƒ์ด๋‹ค.

์„ธ๋ฐ€ํ•œ grid๋กœ ์ƒ์„ฑํ•œ ๊ฒƒ์ด upsampling ํ•œ ๊ฒƒ๋ณด๋‹ค ๋” ์„ ๋ช…ํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ด 1024^2 ํ•ด์ƒ๋„ ๊ฒฐ๊ณผ๋ฌผ์˜ FID๋Š” ์–ด๋– ํ•œ์ง€ ๊ถ๊ธˆํ•œ๋ฐ ๋…ผ๋ฌธ์— ๋”ฐ๋กœ ๊ธฐ๋ก๋˜์–ด ์žˆ์ง€๋Š” ์•Š๋‹ค.

Interpolation

13

Latent Interpolation.

Panorama synthesis

14

Panorama ํ•ฉ์„ฑ, ์›ํ†ต์ขŒํ‘œ๊ณ„๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ•™์Šตํ•˜๊ณ  ์ขŒํ‘œ Grid๋ฅผ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ์ƒ์„ฑํ•œ ๊ฒฐ๊ณผ๋ฌผ.

Panorma ํ•ฉ์„ฑ์€ ๊ฐœ์ธ์ ์œผ๋กœ ์‹ ๊ธฐํ–ˆ๋˜ ์‹คํ—˜์ด์—ˆ๋‹ค.

๊ธฐ์กด์˜ ๋‹ค๋ฅธ ์ขŒํ‘œ๊ณ„ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๊ฐ™์€ ๊ฒฝ์šฐ ์• ์ดˆ์— Panorama ์‚ฌ์ง„๋งŒ์„ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€๋Š”๋ฐ CIPS๋Š” Panorama ๋ฐ์ดํ„ฐ๋Š” ์ „ํ˜€ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋‹จ์ˆœํžˆ ์ผ๋ฐ˜ ์‚ฌ์ง„์„ ์›ํ†ต ์ขŒํ‘œ๊ณ„๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ•™์Šตํ–ˆ์ง€๋งŒ ์ข‹์€ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณด์—ฌ์ค€๋‹ค.

๋˜ Style Interpolation๋„ ์›ํ™œํ•˜๊ฒŒ ์ž˜ ๋˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

Typical artifacts

15

์ €์ž๊ฐ€ ์ญ‰ ์„œ์ˆ ํ•œ ๊ฒƒ์„ ๋ดค์„ ๋•Œ๋Š” artifact ๋ฌธ์ œ๊ฐ€ ์—†๋Š” ์ค„ ์•Œ์•˜์œผ๋‚˜ CIPS ๊ฒฐ๊ณผ๋ฌผ์€ artifact๊ฐ€ ๊ฝค ๋นˆ๋ฒˆํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

Fourier features๋ฅผ ์—ฐ์‚ฐํ•  ๋•Œ sin์„ ์ทจํ•จ์œผ๋กœ์จ ๋ฐœ์ƒํ•˜๋Š” ํŒŒ๋™๊ณผ ๊ฐ™์€ ๋ฌด๋Šฌ๋“ค์ด ๋งŽ์ด ๋ณด์ธ๋‹ค. ๋˜ ์ €์ž๋Š” LeakyReLU๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ ๋˜ํ•œ ์ขŒํ‘œ๊ณ„๋ฅผ ์—ฌ๋Ÿฌ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆ”์— ๋”ฐ๋ผ ์ด๋Ÿฌํ•œ artifact๋ฅผ ์œ ๋„ํ•œ๋‹ค๊ณ  ๋ถ„์„ํ•˜๊ณ  ์žˆ๋‹ค.

StyleGAN2 ์ €์ž๋Š” StyleGAN์—์„œ AdaIN์ด ์ด๋ฏธ์ง€์˜ ์ผ๋ถ€๋ถ„์— ์•„์ฃผ ๊ฐ•๋ ฅํ•œ ์‹ ํ˜ธ๋ฅผ ๋ฐœ์ƒํ•˜๋Š” ์‹์œผ๋กœ ์ž˜๋ชป๋œ ์‹ ํ˜ธ๋ฅผ ์™œ๊ณกํ•˜์—ฌ ํ•™์Šตํ•˜๊ธฐ ๋•Œ๋ฌธ์— artifact๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•˜๊ณ  ์ด๋ฅผ ModConv๋กœ ์ˆ˜์ •ํ•˜์—ฌ artifact๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์— ์„ฑ๊ณตํ–ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ๋™์ผํ•œ ๊ฐœ๋…์˜ ModFC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ๋„ ์ด๊ฒƒ์€ ํ•ด๊ฒฐ์ด ์•ˆ ๋˜์—ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค.

๊ฐœ์ธ์ ์œผ๋กœ ์ƒ๊ฐํ–ˆ์„ ๋•Œ์˜ ์›์ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. StyleGAN2์—์„œ Upsampling ํ•  ๋•Œ FIR Filter๋ฅผ ํ†ตํ•ด ์‹ ํ˜ธ๋ฅผ ๊ณ ๋ฅด๊ฒŒ ๋ถ„์‚ฐ์‹œํ‚จ๋‹ค.

(์‚ฌ์‹ค ์ด ์‹ ํ˜ธ ๋ถ„์‚ฐ์€ ModConv๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์ธ StyleGAN์—์„œ๋„ ์ ์šฉ๋˜์—ˆ๋˜ ๊ฒƒ์ด๊ณ  artifact๊ฐ€ ๋ฐœ์ƒํ–ˆ์—ˆ๋‹ค.) ํ•˜์ง€๋งŒ CIPS๋Š” ๊ทธ๋Ÿฌ์ง€ ์•Š๊ณ  ํ•˜๋‚˜์˜ ํ”ฝ์…€์— ๋Œ€ํ•ด์„œ๋งŒ ์‹ ํ˜ธ๋ฅผ ๊ฐ–๊ธฐ ๋•Œ๋ฌธ์— ์‹ ํ˜ธ ๋ถ„์‚ฐ์ด ๋˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋ž˜์„œ ํŠน์ • ๋ถ€๋ถ„์—์„œ ์ž˜๋ชป๋œ ์‹ ํ˜ธ๋ฅผ ํ•™์Šตํ•˜์—ฌ artifact๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹๊นŒ ์ถ”์ธกํ•œ๋‹ค.

์ €์ž๋Š” CIPS๊ฐ€ ๋‹ค๋ฅธ ํ”ฝ์…€ ์ •๋ณด๋‚˜ upsampling์„ ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— generator๋ฅผ ๋ณดํ˜ธํ•˜์ง€ ๋ชปํ•œ๋‹ค๋ผ๊ณ  ํ•˜๋Š”๋ฐ ๋‹ค๋ฅธ ํ”ฝ์…€ ์ •๋ณด๋ฅผ ์“ฐ์ง€ ์•Š๋Š” ๊ฒƒ์€ ์œ„์˜ ์ถ”์ธก๊ณผ ๋น„์Šทํ•œ ์ด์œ ๊ฒ ์ง€๋งŒ upsampling ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๋Š” ๊ฒƒ์€ ์†”์งํžˆ ์ž˜ ์ดํ•ด๋˜์ง€ ์•Š๋Š”๋‹ค.

Conclusion

CIPS๋ผ๋Š” ํ”ฝ์…€์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑํ•˜์—ฌ ๊ณ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•จ Spatial Convolution, Attention, Upsampling์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์Œ์—๋„ StyleGAN2์— ์ค€ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฌผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ Spectral ๋ถ„ํฌ์— ๋” ๊ฐ€๊นŒ์šด ๊ฒฐ๊ณผ๋ฌผ ์ขŒํ‘œ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ๋‹ค์–‘ํ•œ ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์–ด์คŒ (ex. ํŒŒ๋…ธ๋ผ๋งˆ ์ƒ์„ฑ)

+

  • ํŒŒ๋…ธ๋ผ๋งˆ ๊ฒฐ๊ณผ๊ฐ€ ๋งค์šฐ ์ธ์ƒ์ ์ด์—ˆ์Œ
  • ์ขŒํ‘œ grid ๊ฐ’์œผ๋กœ [0, 1]์„ ๋ฒ—์–ด๋‚œ ๋ฒ”์œ„(ex. [-0.5, 1.5])๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ์–ด๋–ค ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ์ง€ ๊ถ๊ธˆํ•จ
  • StyleGAN2๋Š” ๊ณ ํ•ด์ƒ๋„์˜ ์ด๋ฏธ์ง€๋ฅผ ์‹ค์‚ฌ์— ๊ฐ€๊นŒ์šด ํ’ˆ์งˆ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด ๊ฐ•์ ์ธ๋ฐ, 1024 ํ•ด์ƒ๋„์—์„œ ์ •๋Ÿ‰์ง€ํ‘œ๋ฅผ ๋น„๊ตํ•œ ์ž๋ฃŒ๊ฐ€ ์—†๊ณ  256 ํ•ด์ƒ๋„์— ๋Œ€ํ•ด์„œ๋งŒ FID๋ฅผ ๋น„๊ตํ•œ ๊ฒƒ์ด ์•„์‰ฌ์›€
  • Spectral ๋ถ„ํฌ ์‹คํ—˜ ๊ฒฐ๊ณผ์˜ ๋ถ„์„์ด ๋ฏธํกํ•จ, ์™œ CIPS-NE๊ฐ€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ–๋Š”์ง€ ์˜๋ฌธ์œผ๋กœ ๋‚จ์Œ
  • Artifact ๋ฐœ์ƒ ์›์ธ์— ๋Œ€ํ•œ ๋ถ„์„์ด ๋ช…ํ™•ํ•˜์ง€ ์•Š์Œ