Commit History

lunar lander model #5, using PPO trained with learning rate 0.0005, gamma 0.995, for 1M timesteps
57e96c5

bguan commited on

lunar lander model #5, using PPO trained with learning rate 0.0005, gamma 0.995, for 1M timesteps
1e0b940

bguan commited on