Skip to content

A3C instead of actor-critic in reinforcement_learning/reinforce.py  #151

Open
@susht3

Description

@susht3

There is the code of reinforce.py
for action, r in zip(self.saved_actions, rewards): action.reinforce(r)

And there is the code of actor-critic.py:
for (action, value), r in zip(saved_actions, rewards): reward = r - value.data[0,0] action.reinforce(reward) value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))

So i consider it is Asynchronous Advantage Actor-Critic, A3C, not Actor-critic

Activity

changed the title [-]RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.[/-] [+]experience replay of reinforcement_learning/reinforce.py[/+] on Apr 25, 2017
changed the title [-]experience replay of reinforcement_learning/reinforce.py[/-] [+]A3C instead of actor-critic in reinforcement_learning/reinforce.py [/+] on Apr 26, 2017
jeasinema

jeasinema commented on Oct 31, 2017

@jeasinema

Yes, I'm partly agree with you, but with a small correction, the algorithm implemented should be an offline
version A2C(Advantage Actor Critic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @subramen@jeasinema@susht3

        Issue actions

          A3C instead of actor-critic in reinforcement_learning/reinforce.py · Issue #151 · pytorch/examples