Pareto Conditioned Networks¶

class morl_baselines.multi_policy.pcn.pcn.PCN(env: Env | None, scaling_factor: ndarray, learning_rate: float = 0.001, gamma: float = 1.0, batch_size: int = 256, hidden_dim: int = 64, noise: float = 0.1, project_name: str = 'MORL-Baselines', experiment_name: str = 'PCN', wandb_entity: str | None = None, log: bool = True, seed: int | None = None, device: device | str = 'auto', model_class: Type[BasePCNModel] | None = None)¶

Pareto Conditioned Networks (PCN).

Reymond, M., Bargiacchi, E., & Nowé, A. (2022, May). Pareto Conditioned Networks. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (pp. 1110-1118). https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p1110.pdf

## Credits

This code is a refactor of the code from the authors of the paper, available at: https://github.com/mathieu-reymond/pareto-conditioned-networks

Initialize PCN agent.

Parameters:

env (Optional[gym.Env]) – Gym environment.
scaling_factor (np.ndarray) – Scaling factor for the desired return and horizon used in the model.
learning_rate (float, optional) – Learning rate. Defaults to 1e-3.
gamma (float, optional) – Discount factor. Defaults to 1.0.
batch_size (int, optional) – Batch size. Defaults to 32.
hidden_dim (int, optional) – Hidden dimension. Defaults to 64.
noise (float, optional) – Standard deviation of the noise to add to the action in the continuous action case. Defaults to 0.1.
project_name (str, optional) – Name of the project for wandb. Defaults to “MORL-Baselines”.
experiment_name (str, optional) – Name of the experiment for wandb. Defaults to “PCN”.
wandb_entity (Optional[str], optional) – Entity for wandb. Defaults to None.
log (bool, optional) – Whether to log to wandb. Defaults to True.
seed (Optional[int], optional) – Seed for reproducibility. Defaults to None.
device (Union[th.device, str], optional) – Device to use. Defaults to “auto”.
model_class (Optional[Type[BasePCNModel]], optional) – Model class to use. Defaults to None.

eval(obs, w=None)¶: Evaluate policy action for a given observation.

evaluate(env, max_return, n=10)¶: Evaluate policy in the given environment.

get_config() → dict¶: Get configuration of PCN model.

load(path: str)¶: Load PCN.

save(filename: str = 'PCN_model', save_dir: str = 'weights')¶: Save PCN.

set_desired_return_and_horizon(desired_return: ndarray, desired_horizon: int)¶: Set desired return and horizon for evaluation.

train(total_timesteps: int, eval_env: Env, ref_point: ndarray, known_pareto_front: List[ndarray] | None = None, num_eval_weights_for_eval: int = 50, num_er_episodes: int = 20, num_step_episodes: int = 10, num_model_updates: int = 50, max_return: ndarray | None = None, max_buffer_size: int = 100, num_points_pf: int = 100)¶

Train PCN.

Parameters:

total_timesteps – total number of time steps to train for
eval_env – environment for evaluation
ref_point – reference point for hypervolume calculation
known_pareto_front – Optimal pareto front for metrics calculation, if known.
num_eval_weights_for_eval (int) – Number of weights use when evaluating the Pareto front, e.g., for computing expected utility.
num_er_episodes – number of episodes to fill experience replay buffer
num_step_episodes – number of steps per episode
num_model_updates – number of model updates per episode
max_return – maximum return for clipping desired return. When None, this will be set to 100 for all objectives.
max_buffer_size – maximum buffer size
num_points_pf – number of points to sample from pareto front for metrics calculation

update()¶: Update PCN model.