Integrating Learning and Planning

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Let's Run Jinyeah

Integrating Learning and Planning 본문

Deep Learning/Reinforcement Learning

Integrating Learning and Planning

jinyeah 2021. 8. 31. 18:48

알파고에 쓰인 시뮬레이션 기반 Model-Based RL 방법에 대해 알아보겠다.

Outline

1. Introduction

What is Model?
Model-Free RL vs Model-Based RL
Advantages of Model-Based RL

2. Model-Based RL

Model Learning
Planning with a Model

3. Integrated Architectures

Dyna

4. Simulation-Based Search

Simple Monte-Carlo Search
Monte-Carlo Tree Search(MCTS)
Temporal-Difference Search

Introduction

What is Model?

MDP including Transition probability, Reward function

Model-Free RL vs Model-Based RL

Model-Free RL	Model-Based
- No model - Learn value function(and/or policy) from experience	- Learn a model from experience - Plan value function(and/or policy) from model

Advantages of Model-Based RL

can efficiently learn model by supervised learning methods
can reason about model uncertainty

Model-Based RL

Model Learning

estimate Model from real experience
supervised learning problem

Planning with a model (Sample-Based Planning)

use the model only to generate samples
apply model-free RL to samples

Integrated Architectures

Dyna

Learn a model from real experience
Learn and plan value function (and/or policy) from real and simulated experience

Simulation-Based Search

모델로부터 샘플링이 아닌 시뮬레이션을 통해 planning
Forward search paradigm using sample-based planning
1. Simulate episodes of experience from now with the model
2. Apply model-free RL to simulated episodes

Simple Monte-Carlo Search

Monte-Carlo Control기법 사용
현재 state에서 취할 수 있는 모든 action에 대해서 forward simulation
Return의 평균으로 Q-value 계산하고 Q-value가 최대인 action 선택

Monte-Carlo Tree Search(MCTS)

현재 state에서 tree search를 활용해 simulation
apply MC contorl to sub-MDP from now

Temporal-Difference Search

Using TD instead of MC (bootstrapping)
applies Sarsa to sub-MDP from now

참고

RL Course by David Silver - Lecture 8: Integrating Learning and Planning

'Deep Learning > Reinforcement Learning' 카테고리의 다른 글

Policy Gradient (0)	2021.08.31
Value Function Approximation (0)	2021.08.30
Model-Free Control (0)	2021.08.21

'Deep Learning/Reinforcement Learning' Related Articles

Comments

Let's Run Jinyeah

Integrating Learning and Planning 본문

Integrating Learning and Planning

Introduction

What is Model?

Model-Free RL vs Model-Based RL

Advantages of Model-Based RL

Model-Based RL

Model Learning

Planning with a model (Sample-Based Planning)

Integrated Architectures

Dyna

Simulation-Based Search

Simple Monte-Carlo Search

Monte-Carlo Tree Search(MCTS)

Temporal-Difference Search

'Deep Learning > Reinforcement Learning' 카테고리의 다른 글

티스토리툴바