kalmufti
/

PPO-LunarLander-v2

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Model card Files Files and versions Community

kalmufti commited on Jun 1, 2022

Commit

1cad660

•

1 Parent(s): 7f7658d

Update README.md

Add install, and usage instructions.

Files changed (1) hide show

README.md +45 -4

README.md CHANGED Viewed

@@ -20,9 +20,50 @@ model-index:
       type: LunarLander-v2
 ---
-  # **PPO** Agent playing **LunarLander-v2**
   This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
-  ## Usage (with Stable-baselines3)
-  TODO: Add your code

       type: LunarLander-v2
 ---
+  # **PPO** Agent Playing **LunarLander-v2**
   This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
+  ## Usage (with Stable-baselines3, and huggingface_sb3)
+  To use this model make sure you are running Python version 3.7.13. You can use [pyenv](https://github.com/pyenv/pyenv) to manage multiple versions of Python on your system.
+  ### Install required packages:
+  ```bash
+pip install stable-baselines3
+pip install huggingface_sb3
+pip install pickle5
+pip install Box2D
+pip install pyglet
+```
+You can use this simple script as a base to evaluate and run the model:
+```python
+import gym
+from stable_baselines3 import PPO
+from huggingface_sb3 import load_from_hub
+from stable_baselines3.common.evaluation import evaluate_policy
+# Download the model from the huggingface hub
+checkpoint = load_from_hub(
+    repo_id="kalmufti/PPO-LunarLander-v2",
+    filename="ppo-LunarLander-v2.zip",
+)
+# Load the policy
+model = PPO.load(checkpoint)
+# Create an environment
+env = gym.make("LunarLander-v2")
+# Optional - evaluate the agent means
+mean_reward, std_reward = evaluate_policy(
+    model, env, render=False, n_eval_episodes=5, deterministic=True, warn=False
+)
+print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
+# Watch the agent playing the environment
+obs = env.reset()
+for i in range(1000):
+    action, _state = model.predict(obs)
+    obs, reward, done, info = env.step(action)
+    env.render()
+    if done:
+        obs = env.reset()
+env.close()
+```