[NPLM] A Neural Probabilistic Language Model ๋…ผ๋ฌธ๋ฆฌ๋ทฐ
ยท
Artificial_Intelligence๐Ÿค–/Natural Language Processing
A Neural Probabilistic Language ModelYoshua Bengio,Réjean Ducharme,Pascal Vincent,Christian Janvin2003๋…„ 3์›” 1์ผNPLM์€ ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ๋ฒกํ„ฐ๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ •์—์„œ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•˜์—ฌ ํ–ฅํ›„ Word2Vec์œผ๋กœ ๊ฐ€๋Š” ๊ธฐ๋ฐ˜์ด ๋˜์—ˆ๋‹ค๊ณ ํ•œ๋‹ค.๊ฐ„๋‹จํ•˜๊ฒŒํ•™์Šต ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” n-gram์ด ํฌํ•จ๋œ ๋ฌธ์žฅ์ด ๋‚˜ํƒ€๋‚  ํ™•๋ฅ ์„ 0์œผ๋กœ ๋งค๊ธด๋‹คn์„ 5์ด์ƒ์œผ๋กœ ์„ค์ •ํ•˜๊ธฐ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ์žฅ์˜ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ํฌ์ฐฉํ•ด๋‚ด๊ธฐ ์–ด๋ ต๋‹ค.๋‹จ์–ด/๋ฌธ์žฅ ๊ฐ„ ์œ ์‚ฌ๋„๋Š” ๊ณ ๋ ค ํ•˜์ง€ ์•Š๋Š”๋‹ค.neural net์„ ์“ฐ๊ธฐ ์ด์ „์—๋Š” smoothing( ์ž‘์€ ์ƒ์ˆ˜๋ฅผ ๋”ํ•ด์„œ 0์ด ์•ˆ๋‚˜์˜ค๋„๋ก) ๋˜๋Š” backoff๋ฅผ ์‚ฌ์šฉํ•ด์„œ data sparcity๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค. long-te..
JSON
ยท
Artificial_Intelligence๐Ÿค–/Natural Language Processing
Java Script Object Notation ์˜ ์•ฝ์ž์ด๋‹ค. json์€ ๋‹จ์ˆœํ•œ ๋ฐ์ดํ„ฐ ํฌ๋ฉง์ด๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œ์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์ผ ๋ฟ์ด๋‹ค. ์†์„ฑ-๊ฐ’ ์Œ / ํ‚ค-๊ฐ’ ์Œ json์„ ์“ฐ๋Š” ์ด์œ  jsonํŒŒ์ผ์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„์„œ ๊ฐ์ฒด๋‚˜ ๋ณ€์ˆ˜์— ํ• ๋‹นํ•ด์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค json์˜ ๊ตฌ์กฐ 1. Object(๊ฐ์ฒด) name/value ์˜ ์ˆœ์„œ์Œ์œผ๋กœ set์ด๋‹ค. {} ๋กœ ์ •์˜๋œ๋‹ค. ex) { "์ด๋ฆ„" : "ํ™๊ธธ๋™" } 2. Array(๋ฐฐ์—ด) ex) [ 10, "array", 32 ] ์ „์ฒด์ ์ธ ๊ตฌ์กฐ { "์ด๋ฆ„": "ํ™๊ธธ๋™", → ์ŠคํŠธ๋ง "๋‚˜์ด": 25, → ์ˆซ์ž (์ •์ˆ˜) "ํŠน๊ธฐ": ["๋†๊ตฌ", "๋„์ˆ "], → list ํ‘œํ˜„ ๊ฐ€๋Šฅ "๊ฐ€์กฑ๊ด€๊ณ„": {"์•„๋ฒ„์ง€": "ํ™ํŒ์„œ", "์–ด๋จธ๋‹ˆ": "์ถ˜์„ฌ"}, → array ํ‘œํ˜„ ๊ฐ€..
(NLP)Embedding
ยท
Artificial_Intelligence๐Ÿค–/Natural Language Processing
๋ฐ€์ง‘ํ‘œํ˜„์ด๋ž€ํฌ์†Œํ‘œํ˜„๋œ ๋‹จ์–ด๋ฅผ ์ž„์˜์˜ ๊ธธ์ด์˜ ์‹ค์ˆ˜ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ด ๊ณผ์ •์„ ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ์ด๋ผ ํ•˜๋ฉฐ, ๋ฐ€์ง‘ ํ‘œํ˜„๋œ ๊ฒฐ๊ณผ๋ฅผ ์ž„๋ฒ ๋”ฉ ๋ฐฑํ„ฐ๋ผ ํ•จ.์ž์—ฐ์–ด์ฒ˜๋ฆฌ(Natural Language Processing)๋ถ„์•ผ์—์„œ์˜ ์ž„๋ฒ ๋”ฉ์ด๋ž€์‚ฌ๋žŒ์ด ์“ฐ๋Š” ์ž์—ฐ์–ด > ๊ธฐ๊ณ„๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ˆซ์žํ˜•ํƒœ์˜ vector๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ • ๋ฐ ๊ฒฐ๊ณผ ์ž„๋ฒ ๋”ฉ์˜ ์—ญํ• ๋‹จ์–ด/๋ฌธ์žฅ ๊ฐ„ ๊ด€๋ จ๋„ ๊ณ„์‚ฐ๋Œ€ํ‘œ์  ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฒ• : Word2Vec์ปดํ“จํ„ฐ๊ฐ€ ๊ณ„์‚ฐํ•˜๊ธฐ ์‰ฝ๋„๋ก ๋‹จ์–ด๋ฅผ ์ „์ฒด ๋‹จ์–ด๋“ค๊ฐ„์˜ ๊ด€๊ณ„์— ๋งž์ถฐ ํ•ด๋‹น ๋‹จ์–ด์˜ ํŠน์„ฑ์„ ๊ฐ–๋Š” ๋ฒกํ„ฐ๋กœ ๋ฐ”๊พธ์–ด ๋‹จ์–ด๋“ค ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์ผ์ด ๊ฐ€๋Šฅํ•ด์ง.์ž„๋ฒ ๋”ฉ์„ ํ•˜๋ฉด ๋ฒกํ„ฐ ๊ณต๊ฐ„์„ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ์‹œ๊ฐํ™” ๊ฐ€๋Šฅ์˜๋ฏธ์ /๋ฌธ๋ฒ•์  ์ •๋ณด ํ•จ์ถ•์‚ฌ์น™์—ฐ์‚ฐ ๊ฐ€๋Šฅ.๋ฒกํ„ฐ๊ฐ„ ๋ง์…ˆ/๋บ„์…ˆ ๋“ฑ์„ ํ†ตํ•ด ๋‹จ์–ด๋“ค ์‚ฌ์ด์˜ ์˜๋ฏธ์ , ๋ฌธ๋ฒ•์  ๊ด€๊ณ„ ๋„์ถœ ๊ฐ€๋Šฅ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์„..
Graph
ยท
Artificial_Intelligence๐Ÿค–/Natural Language Processing
๋…ธ๋“œ์™€ ๊ทธ ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๊ฐ„์„ ์„ ํ•˜๋‚˜๋กœ ๋ชจ์•„ ๋†“์€ ์ž๋ฃŒ๊ตฌ์กฐ.์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š” ๊ฐ์ฒด๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋ฃŒ๊ตฌ์กฐ. ๊ทธ๋ž˜ํ”„(Graph) ์šฉ์–ด์ •์ (vertex): ์œ„์น˜๋ผ๋Š” ๊ฐœ๋…. (node ๋ผ๊ณ ๋„ ๋ถ€๋ฆ„)๊ฐ„์„ (edge): ์œ„์น˜ ๊ฐ„์˜ ๊ด€๊ณ„. ์ฆ‰, ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์„  (link, branch ๋ผ๊ณ ๋„ ๋ถ€๋ฆ„)์ธ์ ‘(Adjacency) ์ •์  x์™€ ์ •์  y๊ฐ€ ๊ฐ„์„ ์— ์˜ํ•ด ์—ฐ๊ฒฐ๋˜์–ด์ ธ ์žˆ๋‹ค๋ฉด, ์ด๋“ค ๋‘ ์ •์  x์™€ y๋ฅผ ์ธ์ ‘(Adjacent)๋˜์–ด์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.์ธ์ ‘ ์ •์ (adjacent vertex): ๊ฐ„์„ ์— ์˜ ํ•ด ์ง์ ‘ ์—ฐ๊ฒฐ๋œ ์ •์ ๋ถ€์†(Incident)์ •์  ์‚ฌ์ด์— ์—ฐ๊ฒฐ๋œ ๊ฐ„์„ ์„ ๋‘ ์ •์  X์™€ Y์— ๋ถ€์†๋˜์–ด์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.์ •์ ์˜ ์ฐจ์ˆ˜(degree): ๋ฌด๋ฐฉํ–ฅ ๊ทธ๋ž˜ํ”„์—์„œ ํ•˜๋‚˜์˜ ์ •์ ์— ์ธ์ ‘ํ•œ ์ •์ ์˜ ์ˆ˜๋ฌด๋ฐฉํ–ฅ ๊ทธ๋ž˜ํ”„์— ์กด์žฌํ•˜๋Š” ์ •์ ์˜ ..
[Reuters] single-label, multiclass classification AI
ยท
Artificial_Intelligence๐Ÿค–/Natural Language Processing
#์‹œ์ž‘ ๋กœ์ดํ„ฐ(Reuters) ๋‰ด์Šค ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋‹จ์ผ ๋ ˆ์ด๋ธ” ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ฌธ์ œ ๋‹ค๋ฃจ๊ธฐ ๋ชฉ์  : ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ 11258๊ฐœ์˜ ๊ธฐ์‚ฌ์™€ 46๊ฐœ์˜ ๋‰ด์Šค ์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜. ๊ฐ ํ† ํ”ฝ์€ ์ตœ์†Œ 10๊ฐœ์ด์ƒ์˜ ์ƒ˜ํ”Œ์ด ์žˆ์Œ. from keras.datasets import reuters (train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words = 10000) #๋กœ์ดํ„ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐ€์ ธ์˜ค๊ณ  ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€์ˆ˜์— ๋„ฃ์–ด์คŒ #ํ›ˆ๋ จ์šฉ๋ฐ์ดํ„ฐ์™€ ๊ฒ€์ฆ์šฉ ๋ฐ์ดํ„ฐ ๋ช‡๊ฐœ๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ๋Š”์ง€ ์ถœ๋ ฅ. print(len(train_data)) print(len(test_data)) print(train_data[1]) #2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋“ค์–ด๊ฐ€์žˆ์Œ print(train_labe..
[IMDB] Sentiment Analysis AI
ยท
Artificial_Intelligence๐Ÿค–/Natural Language Processing
# IMDB๋ž€ ์ธํ„ฐ๋„ท ์˜ํ™” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(Internet Movie Database, ์•ฝ์นญ IMDB)์ด๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๋Š” ์˜ํ™” ์‚ฌ์ดํŠธ IMDB์˜ ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋Š” ๋ฆฌ๋ทฐ์— ๋Œ€ํ•œ ํ…์ŠคํŠธ์™€ ํ•ด๋‹น ๋ฆฌ๋ทฐ๊ฐ€ ๊ธ์ •์ธ ๊ฒฝ์šฐ 1, ๋ถ€์ •์ธ ๊ฒฝ์šฐ 0์œผ๋กœ ํ‘œ์‹œํ•œ ๋ ˆ์ด๋ธ”๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์€ ์ผ€๋ผ์Šค์—์„œ importํ•˜์—ฌ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋จธ์‹ ๋Ÿฌ๋‹์„ ์ œ์ž‘ํ•  ๊ฒƒ์ด๋‹ค. ๋ชฉ์ ์€ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜, ๊ทธ ์ค‘์—์„œ๋„ ๊ฐ์„ฑ ๋ถ„๋ฅ˜๋ฅผ ์—ฐ์Šตํ•˜๊ธฐ์œ„ํ•จ์— ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ฐ์„ฑ ๋ถ„๋ฅ˜(Sentiment Analysis)๋ž€, ํ…์ŠคํŠธ ์•ˆ์— ๋“ค์–ด์žˆ๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ฃผ๊ด€์ ์ธ ์ •๋ณด(์˜๊ฒฌ, ๊ฐ์„ฑ, ํ‰๊ฐ€, ํƒœ๋„ ๋“ฑ)์„ ๋จธ์‹ ๋Ÿฌ๋‹์„ ๋Œ๋ ค ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. #์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ #์ผ€๋ผ์Šค์˜ ๋ฐ์ดํ„ฐ์…‹์— ์žˆ๋Š” IMDB๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค...
Liky
'Artificial_Intelligence๐Ÿค–/Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (4 Page)