Evaluations

Utilities related to evaluation.

morl_baselines.common.evaluation.eval_mo(agent, env, w: ~numpy.ndarray | None = None, scalarization=<function dot>, render: bool = False) Tuple[float, float, ndarray, ndarray]

Evaluates one episode of the agent in the environment.

Parameters:
  • agent – Agent

  • env – MO-Gymnasium environment with LinearReward wrapper

  • scalarization – scalarization function, taking weights and reward as parameters

  • w (np.ndarray) – Weight vector

  • render (bool, optional) – Whether to render the environment. Defaults to False.

Returns:

(float, float, np.ndarray, np.ndarray) – Scalarized return, scalarized discounted return, vectorized return, vectorized discounted return

morl_baselines.common.evaluation.eval_mo_reward_conditioned(agent, env, scalarization=<function dot>, w: ~numpy.ndarray | None = None, render: bool = False) Tuple[float, float, ndarray, ndarray]

Evaluates one episode of the agent in the environment. This makes the assumption that the agent is conditioned on the accrued reward i.e. for ESR agent.

Parameters:
  • agent – Agent

  • env – MO-Gymnasium environment

  • scalarization – scalarization function, taking weights and reward as parameters

  • w – weight vector

  • render (bool, optional) – Whether to render the environment. Defaults to False.

Returns:

(float, float, np.ndarray, np.ndarray) – Scalarized return, scalarized discounted return, vectorized return, vectorized discounted return

morl_baselines.common.evaluation.log_all_multi_policy_metrics(current_front: List[ndarray], hv_ref_point: ndarray, reward_dim: int, global_step: int, n_sample_weights: int, ref_front: List[ndarray] | None = None)

Logs all metrics for multi-policy training.

Logged metrics: - hypervolume - expected utility metric (EUM) If a reference front is provided, also logs: - Inverted generational distance (IGD) - Maximum utility loss (MUL)

Parameters:
  • current_front (List) – current Pareto front approximation, computed in an evaluation step

  • hv_ref_point – reference point for hypervolume computation

  • reward_dim – number of objectives

  • global_step – global step for logging

  • n_sample_weights – number of weights to sample for EUM and MUL computation

  • ref_front – reference front, if known

morl_baselines.common.evaluation.log_episode_info(info: dict, scalarization, weights: ndarray | None, global_timestep: int, id: int | None = None, verbose: bool = True)

Logs information of the last episode from the info dict (automatically filled by the RecordStatisticsWrapper).

Parameters:
  • info – info dictionary containing the episode statistics

  • scalarization – scalarization function

  • weights – weights to be used in the scalarization

  • global_timestep – global timestep

  • id – agent’s id

  • verbose – whether to print the episode info

morl_baselines.common.evaluation.policy_evaluation_mo(agent, env, w: ~numpy.ndarray, scalarization=<function dot>, rep: int = 5) Tuple[float, float, ndarray, ndarray]

Evaluates the value of a policy by running the policy for multiple episodes. Returns the average returns.

Parameters:
  • agent – Agent

  • env – MO-Gymnasium environment

  • w (np.ndarray) – Weight vector

  • scalarization – scalarization function, taking reward and weight as parameters

  • rep (int, optional) – Number of episodes for averaging. Defaults to 5.

Returns:

(float, float, np.ndarray, np.ndarray) – Avg scalarized return, Avg scalarized discounted return, Avg vectorized return, Avg vectorized discounted return

morl_baselines.common.evaluation.seed_everything(seed: int)

Set random seeds for reproducibility.

This function should be called only once per python process, preferably at the beginning of the main script. It has global effects on the random state of the python process, so it should be used with care.

Parameters:

seed – random seed