Skip to content

Actor critic example not using discount rate properly #744

Open
@rodrigodesalvobraz

Description

@rodrigodesalvobraz

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in #573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

Activity

dknathalage

dknathalage commented on Jul 30, 2020

@dknathalage

Actually the code implementation γᵗ. [105] - actor_critic.py

for r in model.rewards[::-1]:

The loop recursively multiplies the gamma with the discounted reward of the timestep after it and appends at the beginning of the list.

rodrigodesalvobraz

rodrigodesalvobraz commented on Jul 30, 2020

@rodrigodesalvobraz
Author

Thanks for the reply. However, the section of code you indicate seems to correspond to the calculation of G in the book's pseudo-code (see more complete pseudo-code box below). This portion of the pseudo-code (and the code you indicate) applies the discount starting at the timestep t until the end of the episode.

However, additionally, the book applies the discount rate from the beginning of the episode up to t in the last line of the pseudo-code. It seems to me that it is this application of the discounting rate that is missing in the code.

image

msaroufim

msaroufim commented on Mar 9, 2022

@msaroufim
Member

@rodrigodesalvobraz I'd suggest you try out your improved version and see if it converges faster or to a better result and make a PR if it does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @msaroufim@dknathalage@rodrigodesalvobraz

        Issue actions

          Actor critic example not using discount rate properly · Issue #744 · pytorch/examples