Distilling the Knowledge in Neural Network

본문 바로가기

Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

Let's Run Jinyeah

Distilling the Knowledge in Neural Network 본문

Paper Review/Knowledge Distillation

Distilling the Knowledge in Neural Network

jinyeah 2022. 5. 10. 13:06

[논문] Distilling the Knowledge in Neural Network

Knowledge Distillation의 초창기 논문 (2014 NIPS workshop)
one model compression technique for bringing computations to edge devices. Where the goal is to have a small and compact model to mimic the performance of the cumbersome model
Supervised learning (Label 존재)
Response based knowledge (Teacher의 soft target) + Offline distillation (pre-trained Teacher model)

Knowledge

기존

one-hot encoding을 통한 softmax 확률값에서 가장 높은 값은 1, 나머지 값은 0이 되는 hard target 사용
기존 softmax function은 가장 큰 확률값은 1에 가깝고 나머지는 0에 가까운 값으로 매핑되는 문제점 있음

제안 방법

soft target: Temperature hyperparameter (T)를 softmax function에 추가
- Temperature = 1 일 때, 기존 softmax function과 동일
- Temperature가 클수록 더 soft한 확률분포 얻음

Distillation

Teacher 모델이 학습한 Knowledge를 Student 모델에 전달

disillation loss: soft label과 soft prediction의 차이를 Kullback-Leiber Divergence를 통해 구함
student loss: hard predictions와 hard label을 Cross-entropy를 통해 구함

What is Kullback-Leiber Divergence?

2022.05.10 - [Deep Learning/Basic] - Objective function

reference

Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015).

https://towardsdatascience.com/distilling-knowledge-in-neural-network-d8991faa2cdc

https://intellabs.github.io/distiller/knowledge_distillation.html

https://dsbook.tistory.com/324

Comments

티스토리툴바