Evaluations¶

Utilities related to evaluation.

morl_baselines.common.evaluation.eval_mo(agent, env, w: ~numpy.ndarray | None = None, scalarization=<function dot>, render: bool = False) → Tuple[float, float, ndarray, ndarray]¶

Evaluates one episode of the agent in the environment.

Parameters:

agent – Agent
env – MO-Gymnasium environment with LinearReward wrapper
scalarization – scalarization function, taking weights and reward as parameters
w (np.ndarray) – Weight vector
render (bool, optional) – Whether to render the environment. Defaults to False.

Returns:

(float, float, np.ndarray, np.ndarray) – Scalarized return, scalarized discounted return, vectorized return, vectorized discounted return

morl_baselines.common.evaluation.eval_mo_reward_conditioned(agent, env, scalarization=<function dot>, w: ~numpy.ndarray | None = None, render: bool = False) → Tuple[float, float, ndarray, ndarray]¶

Evaluates one episode of the agent in the environment. This makes the assumption that the agent is conditioned on the accrued reward i.e. for ESR agent.

Parameters:

agent – Agent
env – MO-Gymnasium environment
scalarization – scalarization function, taking weights and reward as parameters
w – weight vector
render (bool, optional) – Whether to render the environment. Defaults to False.

Returns:

(float, float, np.ndarray, np.ndarray) – Scalarized return, scalarized discounted return, vectorized return, vectorized discounted return

morl_baselines.common.evaluation.log_all_multi_policy_metrics(current_front: List[ndarray], hv_ref_point: ndarray, reward_dim: int, global_step: int, n_sample_weights: int, ref_front: List[ndarray] | None = None)¶

Logs all metrics for multi-policy training.

Logged metrics: - hypervolume - expected utility metric (EUM) If a reference front is provided, also logs: - Inverted generational distance (IGD) - Maximum utility loss (MUL)

Parameters:

current_front (List) – current Pareto front approximation, computed in an evaluation step
hv_ref_point – reference point for hypervolume computation
reward_dim – number of objectives
global_step – global step for logging
n_sample_weights – number of weights to sample for EUM and MUL computation
ref_front – reference front, if known

morl_baselines.common.evaluation.log_episode_info(info: dict, scalarization, weights: ndarray | None, global_timestep: int, id: int | None = None, verbose: bool = True)¶

Logs information of the last episode from the info dict (automatically filled by the RecordStatisticsWrapper).

Parameters:

info – info dictionary containing the episode statistics
scalarization – scalarization function
weights – weights to be used in the scalarization
global_timestep – global timestep
id – agent’s id
verbose – whether to print the episode info

morl_baselines.common.evaluation.policy_evaluation_mo(agent, env, w: ~numpy.ndarray, scalarization=<function dot>, rep: int = 5) → Tuple[float, float, ndarray, ndarray]¶

Evaluates the value of a policy by running the policy for multiple episodes. Returns the average returns.

Parameters:

agent – Agent
env – MO-Gymnasium environment
w (np.ndarray) – Weight vector
scalarization – scalarization function, taking reward and weight as parameters
rep (int, optional) – Number of episodes for averaging. Defaults to 5.

Returns:

(float, float, np.ndarray, np.ndarray) – Avg scalarized return, Avg scalarized discounted return, Avg vectorized return, Avg vectorized discounted return

morl_baselines.common.evaluation.seed_everything(seed: int)¶

Set random seeds for reproducibility.

This function should be called only once per python process, preferably at the beginning of the main script. It has global effects on the random state of the python process, so it should be used with care.

Parameters:: seed – random seed