What is NST?
- Style transfer๋, ๋ ์์(content image & style image)์ด ์ฃผ์ด์ก์ ๋ ๊ทธ ์ด๋ฏธ์ง์ ์ฃผ๋ ํํ๋ content image์ ์ ์ฌํ๊ฒ ์ ์งํ๋ฉด์ ์คํ์ผ๋ง ์ฐ๋ฆฌ๊ฐ ์ํ๋ style image์ ์ ์ฌํ๊ฒ ๋ฐ๊พธ๋ ๊ฒ์ ๋งํจ.
Style transfer refers to changing only the style to the style image we want while keeping the main form of the image similar to the content image when two images are given.
- Style Transfer, image-to-image translation, ๋๋ texture transfer ๋ฑ์ผ๋ก ๋ถ๋ฆฌ๋ ์ด ๋ฌธ์ ๋ ํ ์ด๋ฏธ์ง P๋ฅผ ๋ค๋ฅธ ์ด๋ฏธ์ง A์ ์คํ์ผ์ ๊ฐ์ง๋ ์๋ก์ด ์ด๋ฏธ์ง X๋ฅผ ์์ฑํ๋ ๋ฐฉ์.
It called Style Transfer, image-to-image translation, or texture transfer, is a way to create a new image X with one image P as the style of another image A.
๋ฐฉ์
- Content์ style์ ์ ๋ ฅ์ ๋ฐ์ ์๋ก์ด output์ ๋ง๋ค์ด ๋.
- Pretraining ๋ชจ๋ธ์์ ์ปจํ ์ธ ์ ์คํ์ผ ๊ฐ๊ฐ์ feature map์ ์ถ์ถํ์ฌ ์ ์ฅํ๊ณ ,
- Output์ feature map์ด content์ feature map์์ content๊ฐ ๋น์ทํด์ง๋๋ก, style์ feature map์์ style๊ณผ ๋น์ทํด์ง๋๋ก, output ํฝ์ ๋ค์ ์ต์ ํํจ.
Take Content and Style as inputs and create a new Output.
Using the pre-training model, a content feature map and a style feature map are extracted and stored. Output's feature map optimizes output pixels so that the content becomes similar on the Content feature map and style feature map becomes similar on the Style feature map.
Content์ ๋น์จ์ด ๋ ํฌ๋ฉด content์ ํํ๋ฅผ ๋ ์ ์งํ๊ณ ,
Style์ ๋น์จ์ด ๋ ํฌ๋ฉด ํํ๋ฅผ ์ ์งํ์ง ๋ชปํ๊ณ style์ชฝ์ ๋ ์น์ฐ์นจ.
If the proportion of content is larger, the form of content is maintained more.
If the proportion of styles is larger, it cannot maintain its shape and is more biased toward the style.
์ฌ์ฉ ๊ฒฐ๊ณผ
์์ฉ
- ํฌํ ์ต์์ ์ ๊ตํ๊ฒ ๋ง๋ ํฉ์ฑ์ด๋ฏธ์ง ๊ฐ์ด, ์คํ์ผ ์ ์ก์ ์ฌ์ฉํ์ฌ ์์ฐ์ค๋ฌ์ด ํฉ์ฑ ์ด๋ฏธ์ง๋ฅผ ์ ์ํ๋ ค๊ณ ํจ.
- NST ๋ชจ๋ธ์์ ์ปจํ ์ธ ์ด๋ฏธ์ง์ ์คํ์ผ์ด๋ฏธ์ง๋ฅผ ๋์ผํ ์ด๋ฏธ์ง๋ก ํ๊ณ , ๊ฐ์ฒด๋ฅผ ๋ฐ์ ์คํฐ์ปค์ฒ๋ผ ์ปจํ ์ธ ์ด๋ฏธ์ง์ ์ถ๊ฐ์ ์ผ๋ก ๋ถ์ฌ์ ๋ถ๋๋ฌ์ด ์ด๋ฏธ์ง ํฉ์ฑ์ ํ๋ คํ์.
- ๊ฐ์ฒด๋ฅผ ํ๋ ๋ฐ์ ์ฌ์ง์ ๋ถ์ด๊ณ ํฌ๊ฒ ๋๊ฐ์ง ๋ชจ๋ธ๋ก ๋ค๋ฅธ ์ฒ๋ฆฌ๋ฐฉ์์ ์ฌ์ฉํด ๋๋ ค๋ด.
- openCV๋ก ์ด๋ฏธ์ง์์ ๊ฐ์ฒด๋ฅผ ๊ฒ์ถํ๊ณ ์ด๋ฏธ์ง๋ก ์ ์ฅํ์ฌ ์คํฐ์ปค ์ฒ๋ผ ์ฌ์ฉํ๋ ค๊ณ ํจ.
I want to create a natural composite image using style transfer. Like the photoshop composite image.
Setting the content image and the style image as the same image in the NST model.
The object was attached to the content image like a sticker, and a smooth image synthesis was attempted.
Using openCV, an object was detected from an image, stored as an image, and used as a sticker.
The detected object was attached to the picture, and the two models were executed using different processing methods.
Using VGG19 model
- loss ๊ฐ์ ๋ฐ๋ผ ์คํ์ผ ์ ์ก์ ๋ณํ๋์ด ํฌ๊ฒ ๋ฌ๋ผ์ ธ์, weight๋ฅผ ์์๋ก ์์ ํ๋ฉด์ ํ์ตํจ.
The change in Style Transfer varies greatly depending on the loss value.
Therefore, I learned by arbitrarily modifying weight.
content_weight = 2.5e-8
style_weight = 1e-6
- ๊ทผ๋ฐ ํ๊ณ ์ถ์ ๊ฑด ํํ์ด ์๋, ํ์ค ์ด๋ฏธ์ง๋ฅผ ์์ฐ์ค๋ฝ๊ฒ ํฉ์ฑํ๋ ค๊ณ ํจ.
- Magenta ๋ชจ๋ธ๊ณผ VGG19 ๋ชจ๋ธ๋ก ์คํํ์๋ค.
I tried to synthesize the real image naturally, not the style of painting style.
Experiments were conducted with the Magenta model and the VGG19 model.
์ด ๋ชจ๋ธ์ ์ฌ์ฉํด์ ๋๋ฆฌ๋ฉด ํ์ค ์ฌ์ง ๊ฐ์ง์๊ณ , ๋ช ํ๊ฐ์ ๋๋์ด ๋ ๋๋ค๊ณ ๋๊ผ์.
- ์ข์ธก์ ์ด๋ฏธ์ง๋ฅผ MinMax Normalization ํ์ฌ 0~1 ๊ฐ์ float32 ํ์ ์ ํ ์ํ์์ผ๋ก ๋ฐ๊พธ๊ณ , ์ด๋ฅผ ํฌ๊ธฐ๋ฅผ ์กฐ์ ํ์ฌ 4์ฐจ์ ํ ์๋ก ๋ฐ๊ฟ์ ์ฒ๋ฆฌํจ. ์ด๋ฏธ์ง์ ์ต๋ํฌ๊ธฐ๋ฅผ 1024๋ก ์ค์
- ์ฐ์ธก์ ์ด๋ฏธ์ง๋ฅผ ๋๊ฐ์ด ์ ๊ทํ ํ์ฌ ๋ฐ์ด๋ฉ ๋ฐ์ค์ฒ๋ฆฌํด์ ์ ์ฌ๊ฐํ์ผ๋ก ์๋ฅด๊ณ 0-1์ ๊ทํ ์์ผ์ ์ด๋ฅผ ๋ฆฌ์ฌ์ด์ฆํด์ ๋ฃ์. ์ด๋ฏธ์ง์ ์ต๋ํฌ๊ธฐ๋ฅผ 256์ผ๋ก ์ค์ . ์คํ์ผ ์ด๋ฏธ์ง์ kernal(3,3), strides(1,1) ํ avg_pool์ ํ๋ฒ๋ ์ฌ์ฉํ์ฌ ํฌ๊ธฐ๋ฅผ ์ค์ฌ์ ๋๋ฆผ
Left side result
The image was converted into a float32 type tensor type of 0-1 value by MinMax Normalization, and it was converted into a 4D tensor by resizing it. Set the maximum size of the Style image to 1024.
Right side result
I normalized the image equally, processed the bounding box to cut it into a square, normalized it to a value of 0-1, and resized it. Avg_pool (kernal(3,3) and strands(1,1)) were used once more for the style image to reduce the size and execute. Set the maximum size of the Style image to 1024.
- ์ด๋ฏธ์ง๋ฅผ arrayํ์์ผ๋ก ๋ฐ์์ vgg19๋ชจ๋ธ์ ๋ง๊ฒ ํ ์๋ก ๋ฐํ.
- ๊ฐ ์ด๋ฏธ์ง์ loss๊ฐ์ ๊ทธ๋ํ๋ ฌ(Gram Matrix)๋ก ๊ณ์ฐ.
- ๊ทธ๋ ํ๋ ฌ์ด๋ ๊ฐ ํน์ฑ ๋งต์ ํ๊ท ๊ณผ ํผ์ณ๋งต ์ฌ์ด์ ์๊ด๊ด๊ณ์ ์ ๋ณด๋ฅผ ๋ด๊ณ ์๊ณ , ์ด๋ ๊ฐ ์์น์์์ ํน์ฑ๋ฒกํฐ์ ์ธ์ ๊ณผ ํ๊ท ์ผ๋ก ๊ตฌํด์ ์ป์ ์ ์์.
The image is received in Array form and returned to the tensor according to the vgg19 model.
Calculates the loss value of each image as a Gram matrix.
The graph matrix contains information on the correlation between the mean of each feature map and the feature map, which can be obtained by obtaining the external and average of the feature vector at each location.