Miscellaneous¶

General utils for the MORL baselines.

morl_baselines.common.utils.linearly_decaying_value(initial_value, decay_period, step, warmup_steps, final_value)¶

Returns the current value for a linearly decaying parameter.

This follows the Nature DQN schedule of a linearly decaying epsilon (Mnih et al., 2015). The schedule is as follows: Begin at 1. until warmup_steps steps have been taken; then Linearly decay epsilon from 1. to epsilon in decay_period steps; and then Use epsilon from there on.

Parameters:

decay_period – float, the period over which the value is decayed.
step – int, the number of training steps completed so far.
warmup_steps – int, the number of steps taken before the value is decayed.
value (final) – float, the final value to which to decay the value parameter.

Returns:

A float, the current value computed according to the schedule.

morl_baselines.common.utils.make_gif(env, agent, weight: ndarray, fullpath: str, fps: int = 50, length: int = 300)¶: Render an episode and save it as a gif.

morl_baselines.common.utils.nearest_neighbors(n: int, current_weight: ndarray, all_weights: List[ndarray], dist_metric: Callable[[ndarray, ndarray], float]) → List[int]¶

Returns the n closest neighbors of current_weight in all_weights, according to similarity metric.

Parameters:

n – number of neighbors
current_weight – weight vector where we want the nearest neighbors
all_weights – all the possible weights, can contain current_weight as well
dist_metric – distance metric

Returns:

the ids of the nearest neighbors in all_weights

morl_baselines.common.utils.reset_wandb_env()¶

Reset the wandb environment variables.

This is useful when running multiple sweeps in parallel, as wandb will otherwise try to use the same directory for all the runs.

morl_baselines.common.utils.unique_tol(a: List[ndarray], tol=0.0001) → List[ndarray]¶: Returns unique elements of a list of np.arrays, within a tolerance.