Computer-Science

Optimization

Epoch vs Batch vs Iteration

Batch Size์™€ Learning Rate ์ •ํ•˜๊ธฐ

Baseline ๋ชจ๋ธ์˜ Batch Size์™€ Learning Rate๋ฅผ ์•Œ๊ณ  ์žˆ์„ ๋•Œ, Batch Size๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ์ค„์ธ๋‹ค๋ฉด, Learning Rate๋Š” ์–ด๋–ป๊ฒŒ ์กฐ์ ˆํ•ด์•ผ ํ• ๊นŒ?

Gradient Descent

Scalar derivative VS. Gradient

Stochastic Gradient Descent

Mini-batch SGD: Loop

  1. Sample a batch of data
  2. Forward prop it thriugh the graph(network), get loss
  3. Backprop to calculate the gradients
  4. Update the parameters using the gradient

Momentum

Learning Rate Scheduling

Learning Rate Decay

Warmup

References

  1. ์ธ๊ณต์ง€๋Šฅ ์‘์šฉ (ICE4104), ์ธํ•˜๋Œ€ํ•™๊ต ์ •๋ณดํ†ต์‹ ๊ณตํ•™๊ณผ ํ™์„ฑ์€ ๊ต์ˆ˜๋‹˜
  2. uni1023.log - ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ(batch size), ์—ํฌํฌ(epoch), ๋ฐ˜๋ณต(iteration)์˜ ์ฐจ์ด๋Š”?
  3. [๋…ผ๋ฌธ์š”์•ฝ] Classification ํ•™์Šต๋ฐฉ๋ฒ• - Bag of Tricks(2018)
  4. ๋‹คํฌ ํ”„๋กœ๊ทธ๋ž˜๋จธ :: Gradient, Jacobian ํ–‰๋ ฌ, Hessian ํ–‰๋ ฌ, Laplacian