Actor critic example not using discount rate properly

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in https://github.com/pytorch/examples/issues/573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of [Sutton & Barto](http://incompleteideas.net/book/RLbook2018.pdf):

![image](https://user-images.githubusercontent.com/12362395/77792945-07db7300-7048-11ea-950f-3f7ea63192e9.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actor critic example not using discount rate properly #744

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Actor critic example not using discount rate properly #744

Description

Activity

dknathalage commented on Jul 30, 2020

rodrigodesalvobraz commented on Jul 30, 2020

msaroufim commented on Mar 9, 2022

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions