Update after terminal state

I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step.  Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state.  For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.