Computer-Science

Intro2ML

Traing VS. Test

Class Variation

Intra-class Variation VS. Inter-class Variation

Class Variation์ด๋ž€ ๋ฐ์ดํ„ฐ์˜ ํด๋ž˜์Šค๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€(๋ถ„์‚ฐ)์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ‘œํ˜„์ด๋‹ค.

Class Variation์˜ ํ‘œํ˜„์—๋Š” ํฌ๊ฒŒ Inter-class Variation๊ณผ Intra-class Variation์ด ์žˆ๋‹ค.

๋‘ ๋‹จ์–ด๋“ค์˜ ๋ฐœ์Œ์ด ๋น„์Šทํ•ด์„œ ๋” ํ—ท๊ฐˆ๋ฆฌ๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ํ•˜์ง€๋งŒ ์ธํŠธ๋ผ๋„ท์ด ์–ด๋–ค ์กฐ์ง์˜ ํ์‡„์ ์ธ ๋‚ด๋ถ€๋ง์„ ์˜๋ฏธํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋– ์˜ฌ๋ฆฌ๋ฉด ์–ด๋ ต์ง€ ์•Š๋‹ค.

Examples


Example 1




Example 2

์œ„ ๋‘ ์˜ˆ์‹œ๋“ค์€ ๋ฐ์ดํ„ฐ๋“ค์˜ ํด๋ž˜์Šค ๋ถ„ํฌ๋ฅผ ์ƒ‰๊น”๋กœ ๋‚˜ํƒ€๋‚ธ ์ ์ด๋‹ค.

Distance Metric to Compare Vectors(Data)

L1 Distance (Manhattan distance) VS. L2 Distance (Euclidean distance)

L1 Distance์™€ L2 Distance๋Š” ๋‘ ๋ฒกํ„ฐ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ๋˜๋Š” ์œ ์‚ฌ์„ฑ์„ ์ธก์ •ํ•˜๋Š” ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์ด๋‹ค.

Manhattan distance๋ผ๊ณ ๋„ ํ•˜๋Š” L1 Distance๋Š” ๋‘ ๋ฒกํ„ฐ์˜ ํ•ด๋‹น ์š”์†Œ ๊ฐ„ ์ ˆ๋Œ€ ์ฐจ์ด์˜ ํ•ฉ์ด๋‹ค.

์ˆ˜ํ•™์ ์œผ๋กœ ๋‘ ๋ฒกํ„ฐ p์™€ q ์‚ฌ์ด์˜ L1 Distance๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

L1 Distance (Manhattan distance):

\[\begin{equation} d_{1}(\mathbf{p},\mathbf{q}) = \sum_{i=1}^{n} |p_{i}-q_{i}| \end{equation}\]

ํƒ์‹œ๊ฐ€ ํ•œ ์ง€์ ์—์„œ ๋‹ค๋ฅธ ์ง€์ ์œผ๋กœ ์ด๋™ํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆฌ๋“œ์™€ ๊ฐ™์€ ๋„๋กœ ์‹œ์Šคํ…œ์„ ๋”ฐ๋ผ ์ด๋™ํ•ด์•ผ ํ•˜๋Š” ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์—์„œ ์ด๋ฆ„์„ ์–ป์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ทธ๋ฆฌ๋“œ ์„ ์€ ์ผ๋ จ์˜ ์ง๊ฐ์„ ํ˜•์„ฑํ•œ๋‹ค.


์œ ํด๋ฆฌ๋“œ Distance๋ผ๊ณ ๋„ ํ•˜๋Š” L2 Distance๋Š” ๋‘ ๋ฒกํ„ฐ์˜ ํ•ด๋‹น ์š”์†Œ ๊ฐ„ ์ฐจ์ด ์ œ๊ณฑํ•ฉ์˜ ์ œ๊ณฑ๊ทผ์ด๋‹ค. ์ง์„ ์˜ ๋‘ ์  ์‚ฌ์ด์˜ ์ตœ๋‹จ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์—์„œ ์ด๋ฆ„์ด ๋ถ™์—ฌ์กŒ๋‹ค. ์ˆ˜ํ•™์ ์œผ๋กœ ๋‘ ๋ฒกํ„ฐ p์™€ q ์‚ฌ์ด์˜ L2 ๊ฑฐ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

L2 Distance (Euclidean distance):

\[\begin{equation} d_{2}(\mathbf{p},\mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_{i}-q_{i})^2} \end{equation}\]

์ผ๋ฐ˜์ ์œผ๋กœ L2 Distance๋Š” ๋ฒกํ„ฐ ์š”์†Œ ๊ฐ„์˜ ์ฐจ์ด ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ•˜๊ธฐ ๋•Œ๋ฌธ์— L1 Distance๋ณด๋‹ค ๋” ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ˜๋ฉด, L1 Distance๋Š” ์ ˆ๋Œ€ ๊ฐ’๋งŒ ๊ณ ๋ คํ•œ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ L1 Distance๋Š” ์š”์†Œ ๊ฐ„ ์ฐจ์ด์˜ ํฌ๊ธฐ๊ฐ€ ๋‹ค๋ฅด๊ณ  ๋ถ€ํ˜ธ๊ฐ€ ์ค‘์š”ํ•œ ์ƒํ™ฉ์—์„œ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉ๋œ๋‹ค.

Examples

์•„๋ž˜ ๊ทธ๋ฆผ๋“ค์€ L1 Distance์™€ L2 Distance๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ทธ๋ฆผ์ด๋‹ค.


Example 1




Example 2




Example 3

Regression Loss Functions

L1 Loss (Mean Absolute Error) VS. L2 Loss (Mean Squared Error)

ํ‰๊ท  ์ ˆ๋Œ€ ์˜ค์ฐจ(MAE)๋ผ๊ณ ๋„ ํ•˜๋Š” L1 Loss๊ณผ ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(MSE)๋ผ๊ณ ๋„ ํ•˜๋Š” L2 Loss์€ ๋Œ€์ƒ ๋ณ€์ˆ˜์˜ ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’ ๊ฐ„์˜ ์ฐจ์ด ๋˜๋Š” ์˜ค๋ฅ˜๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ํšŒ๊ท€(Regression) ์ž‘์—…์— ์‚ฌ์šฉ๋˜๋Š” ์†์‹ค ํ•จ์ˆ˜์ด๋‹ค.


L1 Loss (Mean Absolute Error):

\(\begin{equation} L_{1}=\frac{1}{n}\sum_{i=1}^{n}|y_{i}-\hat{y}_{i}| \end{equation}\)

L2 Loss (Mean Squared Error):

\[\begin{equation} L_{2}=\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^2 \end{equation}\]


L1 Loss์€ ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’ ๊ฐ„์˜ ์ ˆ๋Œ€ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ˜๋ฉด L2 Loss์€ ์ฐจ์ด ์ œ๊ณฑ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

์ด๋Ÿฌํ•œ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชฉ์ ์€ ํ•™์Šต ์ค‘์— ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’ ๊ฐ„์˜ ์ฐจ์ด ๋˜๋Š” ์˜ค๋ฅ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.

L@ Loss์™€ L@ Distance์˜ ์ฃผ์š” ์ฐจ์ด์ 

L@ Loss VS. L@ Distance

L1 Loss, L2 Loss, L1 Distance, L2 Distance๋Š” ์œ ์‚ฌํ•œ ์ˆ˜ํ•™ ๊ณต์‹์„ ๊ณต์œ ํ•˜์ง€๋งŒ, ์„œ๋กœ ๋‹ค๋ฅธ ์šฉ๋„๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

L@ Loss๋Š” ํšŒ๊ท€ ์ž‘์—…์—์„œ ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ ์ฐจ์ด ๋˜๋Š” ์˜ค๋ฅ˜๋ฅผ ์ธก์ •ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ˜๋ฉด, L@ Distance๋Š” ๊ฑฐ๋ฆฌ ๋˜๋Š” ๋‘ ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ์œ ์‚ฌ์„ฑ์„ ์ธก์ •ํ•˜๋Š”๋ฐ์— ์‚ฌ์šฉ๋œ๋‹ค.

Split Data

๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„๋ฆฌํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ธ๊ฐ€?

๋ฐ์ดํ„ฐ์…‹์„ train set, validation set, test set์œผ๋กœ ๋‚˜๋ˆ„๋Š” ์ด์œ ๋Š” ์ด์ „์— ๋ณด์ง€ ๋ชปํ•œ ๋ฐ์ดํ„ฐ๋กœ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐ–๊ธฐ ์œ„ํ•จ์ด๋‹ค.

train set๋Š” ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, validation set๋Š” ๋ชจ๋ธ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, test set๋Š” ๋ณธ ์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ์ตœ์ข… ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

References

  1. ์ธ๊ณต์ง€๋Šฅ ์‘์šฉ (ICE4104), ์ธํ•˜๋Œ€ํ•™๊ต ์ •๋ณดํ†ต์‹ ๊ณตํ•™๊ณผ ํ™์„ฑ์€ ๊ต์ˆ˜๋‹˜