Open
Description
There is the code of reinforce.py
for action, r in zip(self.saved_actions, rewards): action.reinforce(r)
And there is the code of actor-critic.py:
for (action, value), r in zip(saved_actions, rewards): reward = r - value.data[0,0] action.reinforce(reward) value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))
So i consider it is Asynchronous Advantage Actor-Critic, A3C, not Actor-critic
Activity
[-]RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.[/-][+]experience replay of reinforcement_learning/reinforce.py[/+][-]experience replay of reinforcement_learning/reinforce.py[/-][+]A3C instead of actor-critic in reinforcement_learning/reinforce.py [/+]jeasinema commentedon Oct 31, 2017
Yes, I'm partly agree with you, but with a small correction, the algorithm implemented should be an offline
version A2C(Advantage Actor Critic).