DQN (Deep Q Network)
DQN is learning using deep learning neural networks such as CNN.
Method of storing samples obtained from each time step, randomly selecting these samples to configure and update them into mini-batches.
Existing RL, once it start learning in a bad way, it continue learning in a bad way.
The problem is solved by randomly extracting and breaking the correlation between samples.
DQN์ด๋ CNN๊ฐ์ ๋ฅ๋ฌ๋ ์ ๊ฒฝ๋ง์ ์ฌ์ฉํ์ฌ ํ์ต์ ํ๋ ๊ฒ.
ํ์์คํ
๋ณ๋ก ์ป์ sample(experience)์ ์ ์ฅํ์ฌ ๋๋ค์ผ๋ก ๋ฝ์ ๋ฏธ๋๋ฐฐ์น๋ก ๊ตฌ์ฑํ๊ณ ์
๋ฐ์ดํธํ๋ ๋ฐฉ๋ฒ.
๋๋ค์ถ์ถํจ์ผ๋ก์จ, ํ๋ฒ ์์ข์ ์ชฝ์ผ๋ก ๋น ์ง๋ฉด ๊ณ์ ์์ข์ ์ชฝ์ผ๋ก ํ์ต๋๋ ์ํ๋ค์ ์๊ด๊ด๊ณ๊ฐ ๊นจ์ง.
A3C(Asynchronous Advantage Actor-Critic)
Actor-learner = Each agent that collects samples.
- The same neural network model.
- Different environments.
- Created to break the correlation between learning data for the same reason as DQN.
- Each Actor-learner asynchronously collects a sample during a time-step.
Update the collected samples to the global neural network.
Then, global model is learned, and it is synchronized with the Actor-learner again.
- The speed is fast using multiple agents, so time is shortened and learning performance is excellent.
DQN๊ณผ ๊ฐ์ ์ด์ ๋ก ํ์ต ๋ฐ์ดํฐ๊ฐ์ ์๊ด๊ด๊ณ๋ฅผ ๊นจ๊ธฐ ์ํด ๋ง๋ค์ด์ง.
์ํฐ๋ฌ๋๋ค์ ๊ฐ๊ฐ ๋น๋๊ธฐ์ ์ผ๋ก ํ์์คํ
๋์ ๋ชจ์ ์ํ์ ๊ธ๋ก๋ฒ ์ ๊ฒฝ๋ง์ ์
๋ฐ์ดํธํ๊ณ , ๋ชจ๋ธ์ ํ์ตํ๊ณ , ์ด๋ฅผ ๋ค์ ๊ฐ ์ํฐ๋ฌ๋๋ก ์
๋ฐ์ดํธํ์ฌ ๋ฐ๋ณตํ๋ ๋ฐฉ์.
์ฌ๋ฌ๊ฐ์ ์์ด์ ํธ๋ฅผ ์ฌ์ฉํ์ฌ ์๋๊ฐ ๋นจ๋ผ์ ์๊ฐ ๋จ์ถ๋๊ณ ํ์ต์ฑ๋ฅ์ด ๋ฐ์ด๋จ.
Actor ์กํฐ = ์ํ๊ฐ ์ฃผ์ด์ง๋ฉด ํ๋์ ๊ฒฐ์
Critic ํฌ๋ฆฌํฑ = ๊ทธ ์ํ์ ๊ฐ์น๋ฅผ ํ๊ฐ Evaluate the value of State.