Open
Description
I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step. Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state. For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.
Metadata
Metadata
Assignees
Labels
No labels