Deep Gravity – Telegram

Deep Gravity

384 subscribers

60 photos

35 videos

17 files

495 links

AI

Contact:
[email protected]

Download Telegram

About

Blog

Apps

Platform

384 subscribers

Media is too big

VIEW IN TELEGRAM

DeepMind Deep Reinforcement Learning course 2018

05 - Function Approximation and Deep Reinforcement Learning

YouTube

Slides

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

117 viewsedited 11:10

Media is too big

VIEW IN TELEGRAM

DeepMind Deep Reinforcement Learning course 2018

06 - Policy Gradients and Actor Critics

YouTube

Slides

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

120 viewsedited 11:11

Media is too big

VIEW IN TELEGRAM

DeepMind Deep Reinforcement Learning course 2018

07 - Planning and Models

YouTube

Slides

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

124 viewsedited 11:12

Media is too big

VIEW IN TELEGRAM

DeepMind Deep Reinforcement Learning course 2018

08 - Advanced Topics in Deep RL

YouTube

Slides

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

131 viewsedited 11:14

Media is too big

VIEW IN TELEGRAM

DeepMind Deep Reinforcement Learning course 2018

09 - A Brief Tour of Deep RL Agents

YouTube

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

148 viewsedited 11:16

Media is too big

VIEW IN TELEGRAM

DeepMind Deep Reinforcement Learning course 2018

10 - Classic Games Case Study

YouTube

⚠️ Download all lectures and slides in zipfiles here: (part1) , (part2) , (part3)

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

161 viewsedited 11:18

DeepMind Deep Reinforcement Learning course 2018 (all lectures and slides) - part1

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

203 views11:23

DeepMind Deep Reinforcement Learning course 2018 (all lectures and slides) - part2

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

215 views11:29

DeepMind Deep Reinforcement Learning course 2018 (all lectures and slides) - part3

#DeepReinforcementLearning
#DeepMind

🔭 @DeepGravity

227 views11:31

Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using #DeepReinforcementLearning

Customer services are critical to all companies, as they may directly connect to the brand reputation. Due to a great number of customers, e-commerce companies often employ multiple communication channels to answer customers' questions, for example, chatbot and hotline. On one hand, each channel has limited capacity to respond to customers' requests, on the other hand, customers have different preferences over these channels. The current production systems are mainly built based on business rules, which merely considers tradeoffs between resources and customers' satisfaction. To achieve the optimal tradeoff between resources and customers' satisfaction, we propose a new framework based on deep reinforcement learning, which directly takes both resources and user model into account. In addition to the framework, we also propose a new deep-reinforcement-learning based routing method-double dueling deep Q-learning with prioritized experience replay (PER-DoDDQN). We evaluate our proposed framework and method using both synthetic and a real customer service log data from a large financial technology company. We show that our proposed deep-reinforcement-learning based framework is superior to the existing production system. Moreover, we also show our proposed PER-DoDDQN is better than all other deep Q-learning variants in practice, which provides a more optimal routing plan. These observations suggest that our proposed method can seek the trade-off where both channel resources and customers'

Link

🔭 @DeepGravity

Which Channel to Ask My Question? Personalized Customer Service...

Customer services are critical to all companies, as they may directly connect
to the brand reputation. Due to a great number of customers, e-commerce
companies often employ multiple communication...

98 viewsedited 21:40

Merging Deterministic #PolicyGradient Estimations with Varied #Bias - #Variance Tradeoff for Effective #DeepReinforcementLearning

Deep reinforcement learning (#DRL) on #MarkovDecisionProcess (#MDPs) with continuous action spaces is often approached by directly updating parametric policies along the direction of estimated policy gradients (PGs). Previous research revealed that the performance of these PG algorithms depends heavily on the bias-variance tradeoff involved in estimating and using PGs. A notable approach towards balancing this tradeoff is to merge both on-policy and off-policy gradient estimations for the purpose of training stochastic policies. However this method cannot be utilized directly by sample-efficient off-policy PG algorithms such as #DeepDeterministicPolicyGradient (#DDPG) and #twindelayedDDPG ( #TD3), which have been designed to train deterministic policies. It is hence important to develop new techniques to merge multiple off-policy estimations of deterministic PG (DPG). Driven by this research question, this paper introduces elite #DPG which will be estimated differently from conventional DPG to emphasize on the variance reduction effect at the expense of increased learning bias. To mitigate the extra bias, policy consolidation techniques will be developed to distill policy behavioral knowledge from elite trajectories and use the distilled generative model to further regularize policy training. Moreover, we will study both theoretically and experimentally two different DPG merging methods, i.e., interpolation merging and two-step merging, with the aim to induce varied bias-variance tradeoff through combined use of both conventional DPG and elite DPG. Experiments on six benchmark control tasks confirm that these two merging methods can noticeably improve the learning performance of TD3, significantly outperforming several state-of-the-art #DRL algorithms.

Link

🔭 @DeepGravity

137 views14:56

Our reinforcement learning architect designs have been just published on #NeurIPS2019 AI Art Gallery:

https://lnkd.in/dFZ37BN

Seems it draws like a baby now, but is growing and hopefully would be a skillful #RL artist very soon.

#reinforcementlearning #deeplearning #ai #artificialintelligence #art #deepreinforcementlearning #creativeart #neurips

🔭 @DeepGravity

Yuta Akizuki, Mathias Bernhard, Reza Kakooee, Marirena Kladeftira, Benjamin Dillenburger - AI Art Gallery

Generative Modelling with Design Constraints – Reinforcement Learning for Furniture Generation (2019) Generative design has been…

193 views14:21

Analysing #DeepReinforcementLearning Agents Trained with Domain Randomisation

Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One of the most popular methods for achieving this is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap between the simulator and the real world. However, less work has gone into understanding such agents-which are deployed in the real world-beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation, in order to provide a better understanding of how they function. In this work, we train agents for Fetch and Jaco robots on a visuomotor control task, and evaluate how well they generalise using different unit tests. We tie this with interpretability techniques, providing both quantitative and qualitative data. Finally, we investigate the internals of the trained agents by examining their weights and activations. Our results show that the primary outcome of domain randomisation is more redundant, entangled representations, accompanied with significant statistical/structural changes in the weights; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the use of a wide suite of inspection tools in order to provide sufficient insights into the behaviour of trained agents.

Paper

🔭 @DeepGravity

136 views17:22

Analysing #DeepReinforcementLearning Agents Trained with Domain Randomisation

Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One of the most popular methods for achieving this is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap between the simulator and the real world. However, less work has gone into understanding such agents-which are deployed in the real world-beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation, in order to provide a better understanding of how they function. In this work, we train agents for Fetch and Jaco robots on a visuomotor control task, and evaluate how well they generalise using different unit tests. We tie this with interpretability techniques, providing both quantitative and qualitative data. Finally, we investigate the internals of the trained agents by examining their weights and activations. Our results show that the primary outcome of domain randomisation is more redundant, entangled representations, accompanied with significant statistical/structural changes in the weights; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the use of a wide suite of inspection tools in order to provide sufficient insights into the behaviour of trained agents.

Paper

🔭 @DeepGravity

176 views00:15

Automating Pac-man with #DeepQLearning: An Implementation in #Tensorflow.

Link

#DeepReinforcementLearning

🔭 @DeepGravity

Automating Pac-man with Deep Q-learning: An Implementation in Tensorflow.

Fundamentals of Reinforcement Learning

81 views19:52