DiDi: A Reinforcement Learning Agent

Reinforcement Learning in Daily Life


 

[ Author: DiDi & GU Zhan (Sam) ]
[ Tags: MTech ISAI, Reinforcement learning, Agent, Markov decision process ]


[ Question ]
Could you identify the core elements of a typical reinforcement model blow?

Basic reinforcement is modelled as a Markov decision process:

  • A set of environment and agent states, S;
  • A set of actions, A, of the agent;
  • A probability transition function from state s to state s’ under action a;
  • An immediate reward after transition from s to s’ with action a;
  • Rules that describe what the agent observes.

https://en.wikipedia.org/wiki/Reinforcement_learning


[ Solution ]

  • A set of environment and agent states, S;
    • Agent: the boy, named DiDi, and states are whereabouts of DiDi and his scooter.
    • Environment: The campus DiDi and his scooter are in, with physical obstacles like metal chains

 

  • A set of actions, A, of the agent;
    • DiDi, the agent, can take different actions like: walk, pull, drag, and so on

 

  • A probability transition function from state s to state s’ under action a; DiDi’s action can lead to different states, e.g:
    • DiDi is at left side of metal chain & scooter is at right side of metal chain;
    • DiDi is at right side of metal chain & scooter is at right side of metal chain;
    • DiDi is at left side of metal chain & scooter is at left side of metal chain;
    • And so on.

 

  • An immediate reward after transition from s to s’ with action a;
    • If both DiDi and scooter are both at left side of metal chain, then reward is that DiDi can happily move on his scoot journey.
    • If scooter is at right side, blocked by metal chain, then reward is “none”, or DiDi feels helplessly.

 

  • Rules that describe what the agent observes.
    • What DiDi can see and feel, e.g. aware of where his scooter is, the scooter is blocked by metal chain, and so on.

[ The End ]