Modern Reinforcement-learning using Deep Learning

Modern Reinforcement-learning using Deep Learning


  • Interest in Deep reinforcement-learning


Hello I am Nitsan Soffair, A Deep RL researcher at BGU.

In my Deep reinforcement-learning course you will learn the newest state-of-the-art Deep reinforcement-learning knowledge.

You will do the following

  1. Get state-of-the-art knowledge regarding
    1. Model types
    2. Algorithms and approaches
    3. Function approximation
    4. Deep reinforcement-learning
    5. Deep Multi-agent Reinforcement-learning
  2. Validate your knowledge by answering short and very short quizzes of each lecture.
  3. Be able to complete the course by ~2 hours.


  1. Model types
    1. Markov decision process (MDP)

      A discrete-time stochastic control process.

    2. Partially observable Markov decision process (POMDP)

      A generalization of MDP in which an agent cannot observe the state.

    3. Decentralized Partially observable Markov decision process (Dec-POMDP)

      A generalization of POMDP to consider multiple decentralized agents.

  2. Algorithms and approaches
    1. Bellman equations

      A condition for optimality of optimization of dynamic programming.

    2. Model-free

      A model-free algorithm is an algorithm which does not use the policy of the MDP.

    3. Off-policy

      An off-policy algorithm is an algorithm that use policy 1 for learning and policy 2 for acting in the environment.

    4. Exploration-exploitation

      A trade-off in Reinforcement-learning between exploring new policies to use existing policies.

    5. Value-iteration

      An iterative algorithm applying bellman optimality backup.

    6. SARSA

      An algorithm for learning a Markov decision process policy

    7. Q-learning

      A model-free reinforcement learning algorithm to learn the value of an action in a particular state.

  3. Function approximation
    1. Function approximators

      The problem asks us to select a function among a well-defined class that closely matches (“approximates”) a target function in a task-specific way.

    2. Policy-gradient

      Value-based, Policy-based, Actor-critic, policy-gradient, and softmax policy


      A policy-gradient algorithm.

  4. Deep reinforcement-learning
    1. Deep Q-Network (DQN)

      A deep reinforcement-learning algorithm using experience reply and fixed Q-targets.

    2. Deep Recurrent Q-Learning (DRQN)

      Deep reinforcement-learning algorithm for POMDP extends DQN and uses LSTM.

    3. Optimistic Exploration with Pessimistic Initialization (OPIQ)

      A deep reinforcement-learning for MDP based on DQN.

    4. Value Decomposition Networks (VDN)

      A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.

    5. QMIX

      A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.

    6. QTRAN

      A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.

    7. Weighted QMIX

      A deep multi-agent reinforcement-learning for Dec-POMDP.


  • Wikipedia
  • David Silver’s Reinforcement-learning course

Who this course is for:

  • Anyone who interests in Deep reinforcement-learning


We will be happy to hear your thoughts

Leave a reply

awsomenews Free Online Courses