Introduction
- Classification, Prediction using RL like DQN.
- Different
Existing deep learning is a method of increasing the accuracy of a model through a network.
DQN is a reinforcement learning method in which Q values are selected and acted through a model.
- Usually, RL does not used to solve classification or Prediction problems.
DQN code used in the game is analyzed and refactored to be used.
- ๊ฐํํ์ต ์๊ณ ๋ฆฌ์ฆ์ธ DQN์ ์ฌ์ฉํด์ ๋ถ๋ฅ ๋ฐ ์์ธก์ ์๋ํจ
- ์ผ๋ฐ ๋ฅ๋ฌ๋๊ณผ ๋ค๋ฅธ ์
๊ธฐ์กด ๋ฅ๋ฌ๋์ ๋คํธ์ํฌ๋ฅผ ํตํด ๋ชจ๋ธ์ ์ ํ๋๋ฅผ ์ฌ๋ฆฌ๋ ๋ฐฉ์์ด๋ผ๋ฉด
DQN์ ๋ชจ๋ธ์ ํตํด Q๊ฐ์ ๊ณจ๋ผ์ ํ๋์ ํ๋ ๊ฐํํ์ต ๋ฐฉ์์ด๋ค.
- ๊ฐํํ์ต์ผ๋ก๋ ๋ณดํต ๊ฒ์, ๋ก๋ด๊ณผ ๊ฐ์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ค.
๊ฐํํ์ต์ผ๋ก ๋ณดํต classification์ด๋ prediction ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์ง ์๊ธฐ์ ๊ฒ์์์ ์ฌ์ฉ๋๋ DQN ์ฝ๋๋ฅผ ๋ถ์ํ๊ณ ๋ฆฌํฉํ ๋งํ์ฌ ์ฌ์ฉํจ.
Data
Using Titanic Dataset in the Kaggle.
total of 12 attribute values and 891 data.
Using 7 attributes excluding 5 attributes.
๋ฐ์ดํฐ์
์ ์บ๊ธ์ ํ์ดํ๋์ผ๋ก ์คํํด๋ด
์ด 12๊ฐ์ ์์ฑ๊ฐ์ด ์๊ณ , 891๊ฐ์ ๋ฐ์ดํฐ๊ฐ ์์
์ด ์ค์์ 5๊ฐ์ ์์ฑ ๋นผ๊ณ 7๊ฐ์ ์์ฑ๋ง ๊ฐ์ง๊ณ ์งํํด๋ด
Only DNN
preprocess it to compare.
Train : Validation = 7:3 to share data.
It proceeds under the same conditions using a simple DNN model.
When using a general DNN model using only 7 attribute values,
Train accuracy : 91.493%
Validation accuracy : 74.254%
Test : 62.20%
๋น๊ต๋ฅผ ํ๊ธฐ์ํด ์ ์ฒ๋ฆฌํ๊ณ ,
Train : Validation = 7:3์ผ๋ก ๋ฐ์ดํฐ๋ฅผ ๋๋.
๊ฐ๋จํ DNN ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ๊ฐ์ ์กฐ๊ฑด์ผ๋ก ์งํํจ
With DQN
When learning with a Deep Q-Network (DQN) model using DNN
Two Q values according to behavior are received through DNN and output.
Among these values, behavior according to a large value or random behavior according to the set epsilon probability.
Try this action and set +100 reward if it get it right and -100 reward if it get it wrong.
This is stored in the replay memory as [Environment, Action, Correct Answer, Reward].
When it becomes larger than the set value of the replay memory, the model is updated by randomly selecting a sample and learning it.
- Using DQN
Train Accuracy : 92.616%
Validation Accuracy : 79.85%
Test : 73.444%
DNN์ ์ฌ์ฉํ DQN(Deep Q-Network) ๋ชจ๋ธ๋ก ํ์ตํ์์ ๋
DNN์ ๊ฑฐ์ณ output์ผ๋ก ํ๋์ ๋ฐ๋ฅธ 2๊ฐ์ง์ Q๊ฐ์ ๋ฐ์.
์ด ๊ฐ ์ค์์ ํฐ ๊ฐ์ ๋ฐ๋ฅธ ํ๋์ ํ๊ฑฐ๋, ์ค์ ํ epsilon ํ๋ฅ ์ ๋ฐ๋ฅธ ๋๋คํ ํ๋์ ํจ.
์ด ํ๋์ ํด๋ณด๊ณ ๋ง์ถ๋ฉด +100, ํ๋ฆฌ๋ฉด -100 reward๋ฅผ ์ค์ ํจ.
์ด๋ฅผ ๋ฆฌํ๋ ์ด ๋ฉ๋ชจ๋ฆฌ์ [ํ๊ฒฝ, ํ๋, ์ ๋ต, ๋ฆฌ์๋]๋ก ์ ์ฅํจ.
๋ฆฌํ๋ ์ด ๋ฉ๋ชจ๋ฆฌ ์ค์ ํ ๊ฐ๋ณด๋ค ์ปค์ง๋ฉด ๊ทธ์ค์์ ๋๋ค์ผ๋ก ์ํ์ ๋ฝ์์ ํ์ต์ ํ์ฌ ๋ชจ๋ธ์ ์
๋ฐ์ดํธํจ.
Result
Next
Previously, 4,000 pieces of fresh_orange and rotten_orange data were obtained through Crawling.
There is a code that was classified as an Inception model and received about 95% of Test Accuracy.
If we apply the DQN and DDQN algorithms, we will experiment to see if the accuracy will increase further.
- only use the pre-training model,
- DQN algorithm using pre-training model.
์ด์ ์ ํฌ๋กค๋ง์ผ๋ก fresh_orange, rotten_orange ๋ฐ์ดํฐ 4์ฒ์ฅ์ ์ป๊ณ ,
Inception ๋ชจ๋ธ๋ก ๋ถ๋ฅํด์ Test Accuracy ์ฝ 95% ๋ฐ์ ์ฝ๋๊ฐ ์๋๋ฐ,
์ด๋ฅผ DQN, DDQN ์๊ณ ๋ฆฌ์ฆ์ ์ ์ฉํ๋ฉด ์ ํ๋๊ฐ ๋ ์ฌ๋ผ๊ฐ๋ ์คํํด๋ณผ ์์ ์
๋๋ค.
- pre-training model๋ง ์ฌ์ฉํ์ ๋
- pre-training model์ ์ฌ์ฉํ๋ DQN ์๊ณ ๋ฆฌ์ฆ