Deep Gravity
384 subscribers
60 photos
35 videos
17 files
495 links
Download Telegram
Media is too big
VIEW IN TELEGRAM
DeepMind Deep Reinforcement Learning course 2018

05 - Function Approximation and Deep Reinforcement Learning

YouTube

Slides

⚠️
Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
Media is too big
VIEW IN TELEGRAM
DeepMind Deep Reinforcement Learning course 2018

06 - Policy Gradients and Actor Critics

YouTube

Slides

⚠️
Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
Media is too big
VIEW IN TELEGRAM
DeepMind Deep Reinforcement Learning course 2018

07 - Planning and Models

YouTube

Slides

⚠️
Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
Media is too big
VIEW IN TELEGRAM
DeepMind Deep Reinforcement Learning course 2018

08 - Advanced Topics in Deep RL

YouTube

Slides

⚠️
Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
Media is too big
VIEW IN TELEGRAM
DeepMind Deep Reinforcement Learning course 2018

09 - A Brief Tour of Deep RL Agents

YouTube

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
Media is too big
VIEW IN TELEGRAM
DeepMind Deep Reinforcement Learning course 2018

10 - Classic Games Case Study

YouTube

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
DRL.part1.rar
1000 MB
DeepMind Deep Reinforcement Learning course 2018 (all lectures and slides) - part1

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
DRL.part2.rar
1000 MB
DeepMind Deep Reinforcement Learning course 2018 (all lectures and slides) - part2

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
DRL.part3.rar
245.4 MB
DeepMind Deep Reinforcement Learning course 2018 (all lectures and slides) - part3

#DeepReinforcementLearning
#DeepMind

πŸ”­ @DeepGravity
Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using #DeepReinforcementLearning


Customer services are critical to all companies, as they may directly connect to the brand reputation. Due to a great number of customers, e-commerce companies often employ multiple communication channels to answer customers' questions, for example, chatbot and hotline. On one hand, each channel has limited capacity to respond to customers' requests, on the other hand, customers have different preferences over these channels. The current production systems are mainly built based on business rules, which merely considers tradeoffs between resources and customers' satisfaction. To achieve the optimal tradeoff between resources and customers' satisfaction, we propose a new framework based on deep reinforcement learning, which directly takes both resources and user model into account. In addition to the framework, we also propose a new deep-reinforcement-learning based routing method-double dueling deep Q-learning with prioritized experience replay (PER-DoDDQN). We evaluate our proposed framework and method using both synthetic and a real customer service log data from a large financial technology company. We show that our proposed deep-reinforcement-learning based framework is superior to the existing production system. Moreover, we also show our proposed PER-DoDDQN is better than all other deep Q-learning variants in practice, which provides a more optimal routing plan. These observations suggest that our proposed method can seek the trade-off where both channel resources and customers'

Link

πŸ”­ @DeepGravity
Merging Deterministic #PolicyGradient Estimations with Varied #Bias - #Variance Tradeoff for Effective #DeepReinforcementLearning

Deep reinforcement learning (#DRL) on #MarkovDecisionProcess (#MDPs) with continuous action spaces is often approached by directly updating parametric policies along the direction of estimated policy gradients (PGs). Previous research revealed that the performance of these PG algorithms depends heavily on the bias-variance tradeoff involved in estimating and using PGs. A notable approach towards balancing this tradeoff is to merge both on-policy and off-policy gradient estimations for the purpose of training stochastic policies. However this method cannot be utilized directly by sample-efficient off-policy PG algorithms such as #DeepDeterministicPolicyGradient (#DDPG) and #twindelayedDDPG ( #TD3), which have been designed to train deterministic policies. It is hence important to develop new techniques to merge multiple off-policy estimations of deterministic PG (DPG). Driven by this research question, this paper introduces elite #DPG which will be estimated differently from conventional DPG to emphasize on the variance reduction effect at the expense of increased learning bias. To mitigate the extra bias, policy consolidation techniques will be developed to distill policy behavioral knowledge from elite trajectories and use the distilled generative model to further regularize policy training. Moreover, we will study both theoretically and experimentally two different DPG merging methods, i.e., interpolation merging and two-step merging, with the aim to induce varied bias-variance tradeoff through combined use of both conventional DPG and elite DPG. Experiments on six benchmark control tasks confirm that these two merging methods can noticeably improve the learning performance of TD3, significantly outperforming several state-of-the-art #DRL algorithms.

Link

πŸ”­ @DeepGravity
Analysing #DeepReinforcementLearning Agents Trained with Domain Randomisation

Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One of the most popular methods for achieving this is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap between the simulator and the real world. However, less work has gone into understanding such agents-which are deployed in the real world-beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation, in order to provide a better understanding of how they function. In this work, we train agents for Fetch and Jaco robots on a visuomotor control task, and evaluate how well they generalise using different unit tests. We tie this with interpretability techniques, providing both quantitative and qualitative data. Finally, we investigate the internals of the trained agents by examining their weights and activations. Our results show that the primary outcome of domain randomisation is more redundant, entangled representations, accompanied with significant statistical/structural changes in the weights; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the use of a wide suite of inspection tools in order to provide sufficient insights into the behaviour of trained agents.

Paper

πŸ”­ @DeepGravity
Analysing #DeepReinforcementLearning Agents Trained with Domain Randomisation

Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One of the most popular methods for achieving this is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap between the simulator and the real world. However, less work has gone into understanding such agents-which are deployed in the real world-beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation, in order to provide a better understanding of how they function. In this work, we train agents for Fetch and Jaco robots on a visuomotor control task, and evaluate how well they generalise using different unit tests. We tie this with interpretability techniques, providing both quantitative and qualitative data. Finally, we investigate the internals of the trained agents by examining their weights and activations. Our results show that the primary outcome of domain randomisation is more redundant, entangled representations, accompanied with significant statistical/structural changes in the weights; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the use of a wide suite of inspection tools in order to provide sufficient insights into the behaviour of trained agents.

Paper

πŸ”­ @DeepGravity