Reinforcement Learning in Daily Life
[ Author: DiDi & GU Zhan (Sam) ]
[ Tags: MTech IS, AI, Reinforcement learning, Agent, Markov decision process ]
[ Question ]
Could you identify the core elements of a typical reinforcement model blow?
Basic reinforcement is modelled as a Markov decision process:
- A set of environment and agent states, S;
- A set of actions, A, of the agent;
- A probability transition function from state s to state s’ under action a;
- An immediate reward after transition from s to s’ with action a;
- Rules that describe what the agent observes.
https://en.wikipedia.org/wiki/Reinforcement_learning
[ Solution ]
- A set of environment and agent states, S;
- Agent: the boy, named DiDi, and states are whereabouts of DiDi and his scooter.
- Environment: The campus DiDi and his scooter are in, with physical obstacles like metal chains
- A set of actions, A, of the agent;
- DiDi, the agent, can take different actions like: walk, pull, drag, and so on
- A probability transition function from state s to state s’ under action a; DiDi’s action can lead to different states, e.g:
- DiDi is at left side of metal chain & scooter is at right side of metal chain;
- DiDi is at right side of metal chain & scooter is at right side of metal chain;
- DiDi is at left side of metal chain & scooter is at left side of metal chain;
- And so on.
- An immediate reward after transition from s to s’ with action a;
- If both DiDi and scooter are both at left side of metal chain, then reward is that DiDi can happily move on his scoot journey.
- If scooter is at right side, blocked by metal chain, then reward is “none”, or DiDi feels helplessly.
- Rules that describe what the agent observes.
- What DiDi can see and feel, e.g. aware of where his scooter is, the scooter is blocked by metal chain, and so on.
[ The End ]