Commit History

lunar lander model #4, using PPO trained with learning rate 0.0005 for 500K timesteps
0e6fc9b

bguan commited on