Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
Tags
- objective functions for machine learning
- rest-api
- resample
- domain adaptation
- shadowing
- Excel
- sidleup
- Policy Gradient
- noise contrast estimation
- straightup
- pulloff
- scowl
- MRI
- model-free control
- 자료구조
- REINFORCE
- loss functions
- non parametic softmax
- checkitout
- normalization
- remove outliers
- fastapi
- Inorder Traversal
- 3d medical image
- sample rows
- thresholding
- clip intensity values
- Actor-Critic
- freebooze
- Knowledge Distillation
Archives
- Today
- Total
Let's Run Jinyeah
Integrating Learning and Planning 본문
알파고에 쓰인 시뮬레이션 기반 Model-Based RL 방법에 대해 알아보겠다.
Outline
1. Introduction
- What is Model?
- Model-Free RL vs Model-Based RL
- Advantages of Model-Based RL
2. Model-Based RL
- Model Learning
- Planning with a Model
3. Integrated Architectures
- Dyna
4. Simulation-Based Search
- Simple Monte-Carlo Search
- Monte-Carlo Tree Search(MCTS)
- Temporal-Difference Search
Introduction
What is Model?
- MDP including Transition probability, Reward function
Model-Free RL vs Model-Based RL
Model-Free RL | Model-Based |
- No model - Learn value function(and/or policy) from experience |
- Learn a model from experience - Plan value function(and/or policy) from model |
Advantages of Model-Based RL
- can efficiently learn model by supervised learning methods
- can reason about model uncertainty
Model-Based RL
Model Learning
- estimate Model from real experience
- supervised learning problem
Planning with a model (Sample-Based Planning)
- use the model only to generate samples
- apply model-free RL to samples
Integrated Architectures
Dyna
- Learn a model from real experience
- Learn and plan value function (and/or policy) from real and simulated experience
Simulation-Based Search
- 모델로부터 샘플링이 아닌 시뮬레이션을 통해 planning
- Forward search paradigm using sample-based planning
- Simulate episodes of experience from now with the model
- Apply model-free RL to simulated episodes
Simple Monte-Carlo Search
- Monte-Carlo Control기법 사용
- 현재 state에서 취할 수 있는 모든 action에 대해서 forward simulation
- Return의 평균으로 Q-value 계산하고 Q-value가 최대인 action 선택
Monte-Carlo Tree Search(MCTS)
- 현재 state에서 tree search를 활용해 simulation
- apply MC contorl to sub-MDP from now
Temporal-Difference Search
- Using TD instead of MC (bootstrapping)
- applies Sarsa to sub-MDP from now
참고
RL Course by David Silver - Lecture 8: Integrating Learning and Planning
'Deep Learning > Reinforcement Learning' 카테고리의 다른 글
Policy Gradient (0) | 2021.08.31 |
---|---|
Value Function Approximation (0) | 2021.08.30 |
Model-Free Control (0) | 2021.08.21 |
Comments