Pareto Conditioned Networks

class morl_baselines.multi_policy.pcn.pcn.PCN(env: Env | None, scaling_factor: ndarray, learning_rate: float = 0.001, gamma: float = 1.0, batch_size: int = 256, hidden_dim: int = 64, noise: float = 0.1, project_name: str = 'MORL-Baselines', experiment_name: str = 'PCN', wandb_entity: str | None = None, log: bool = True, seed: int | None = None, device: device | str = 'auto', model_class: Type[BasePCNModel] | None = None)

Pareto Conditioned Networks (PCN).

Reymond, M., Bargiacchi, E., & Nowé, A. (2022, May). Pareto Conditioned Networks. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (pp. 1110-1118). https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p1110.pdf

## Credits

This code is a refactor of the code from the authors of the paper, available at: https://github.com/mathieu-reymond/pareto-conditioned-networks

Initialize PCN agent.

Parameters:
  • env (Optional[gym.Env]) – Gym environment.

  • scaling_factor (np.ndarray) – Scaling factor for the desired return and horizon used in the model.

  • learning_rate (float, optional) – Learning rate. Defaults to 1e-2.

  • gamma (float, optional) – Discount factor. Defaults to 1.0.

  • batch_size (int, optional) – Batch size. Defaults to 32.

  • hidden_dim (int, optional) – Hidden dimension. Defaults to 64.

  • noise (float, optional) – Standard deviation of the noise to add to the action in the continuous action case. Defaults to 0.1.

  • project_name (str, optional) – Name of the project for wandb. Defaults to “MORL-Baselines”.

  • experiment_name (str, optional) – Name of the experiment for wandb. Defaults to “PCN”.

  • wandb_entity (Optional[str], optional) – Entity for wandb. Defaults to None.

  • log (bool, optional) – Whether to log to wandb. Defaults to True.

  • seed (Optional[int], optional) – Seed for reproducibility. Defaults to None.

  • device (Union[th.device, str], optional) – Device to use. Defaults to “auto”.

  • model_class (Optional[Type[BasePCNModel]], optional) – Model class to use. Defaults to None.

eval(obs, w=None)

Evaluate policy action for a given observation.

evaluate(env, max_return, n=10)

Evaluate policy in the given environment.

get_config() dict

Get configuration of PCN model.

save(filename: str = 'PCN_model', savedir: str = 'weights')

Save PCN.

set_desired_return_and_horizon(desired_return: ndarray, desired_horizon: int)

Set desired return and horizon for evaluation.

train(total_timesteps: int, eval_env: Env, ref_point: ndarray, known_pareto_front: List[ndarray] | None = None, num_eval_weights_for_eval: int = 50, num_er_episodes: int = 20, num_step_episodes: int = 10, num_model_updates: int = 50, max_return: ndarray | None = None, max_buffer_size: int = 100, num_points_pf: int = 100)

Train PCN.

Parameters:
  • total_timesteps – total number of time steps to train for

  • eval_env – environment for evaluation

  • ref_point – reference point for hypervolume calculation

  • known_pareto_front – Optimal pareto front for metrics calculation, if known.

  • num_eval_weights_for_eval (int) – Number of weights use when evaluating the Pareto front, e.g., for computing expected utility.

  • num_er_episodes – number of episodes to fill experience replay buffer

  • num_step_episodes – number of steps per episode

  • num_model_updates – number of model updates per episode

  • max_return – maximum return for clipping desired return. When None, this will be set to 100 for all objectives.

  • max_buffer_size – maximum buffer size

  • num_points_pf – number of points to sample from pareto front for metrics calculation

update()

Update PCN model.