DiDi: A Reinforcement Learning Agent

August 20, 2018August 20, 2018 ~ TelescopeUser

Reinforcement Learning in Daily Life

[ Author: DiDi & GU Zhan (Sam) ]
[ Tags: MTech IS, AI, Reinforcement learning, Agent, Markov decision process ]

[ Question ]
Could you identify the core elements of a typical reinforcement model blow?

Basic reinforcement is modelled as a Markov decision process:

A set of environment and agent states, S;
A set of actions, A, of the agent;
A probability transition function from state s to state s’ under action a;
An immediate reward after transition from s to s’ with action a;
Rules that describe what the agent observes.

https://en.wikipedia.org/wiki/Reinforcement_learning

[ Solution ]

A set of environment and agent states, S;
- Agent: the boy, named DiDi, and states are whereabouts of DiDi and his scooter.
- Environment: The campus DiDi and his scooter are in, with physical obstacles like metal chains

A set of actions, A, of the agent;
- DiDi, the agent, can take different actions like: walk, pull, drag, and so on

A probability transition function from state s to state s’ under action a; DiDi’s action can lead to different states, e.g:
- DiDi is at left side of metal chain & scooter is at right side of metal chain;
- DiDi is at right side of metal chain & scooter is at right side of metal chain;
- DiDi is at left side of metal chain & scooter is at left side of metal chain;
- And so on.

An immediate reward after transition from s to s’ with action a;
- If both DiDi and scooter are both at left side of metal chain, then reward is that DiDi can happily move on his scoot journey.
- If scooter is at right side, blocked by metal chain, then reward is “none”, or DiDi feels helplessly.

Rules that describe what the agent observes.
- What DiDi can see and feel, e.g. aware of where his scooter is, the scooter is blocked by metal chain, and so on.

[ The End ]

Published by TelescopeUser

View all posts by TelescopeUser