Miscellaneous¶
General utils for the MORL baselines.
- morl_baselines.common.utils.linearly_decaying_value(initial_value, decay_period, step, warmup_steps, final_value)¶
Returns the current value for a linearly decaying parameter.
This follows the Nature DQN schedule of a linearly decaying epsilon (Mnih et al., 2015). The schedule is as follows: Begin at 1. until warmup_steps steps have been taken; then Linearly decay epsilon from 1. to epsilon in decay_period steps; and then Use epsilon from there on.
- Parameters:
decay_period – float, the period over which the value is decayed.
step – int, the number of training steps completed so far.
warmup_steps – int, the number of steps taken before the value is decayed.
value (final) – float, the final value to which to decay the value parameter.
- Returns:
A float, the current value computed according to the schedule.
- morl_baselines.common.utils.make_gif(env, agent, weight: ndarray, fullpath: str, fps: int = 50, length: int = 300)¶
Render an episode and save it as a gif.
- morl_baselines.common.utils.nearest_neighbors(n: int, current_weight: ndarray, all_weights: List[ndarray], dist_metric: Callable[[ndarray, ndarray], float]) List[int] ¶
Returns the n closest neighbors of current_weight in all_weights, according to similarity metric.
- Parameters:
n – number of neighbors
current_weight – weight vector where we want the nearest neighbors
all_weights – all the possible weights, can contain current_weight as well
dist_metric – distance metric
- Returns:
the ids of the nearest neighbors in all_weights
- morl_baselines.common.utils.reset_wandb_env()¶
Reset the wandb environment variables.
This is useful when running multiple sweeps in parallel, as wandb will otherwise try to use the same directory for all the runs.
- morl_baselines.common.utils.unique_tol(a: List[ndarray], tol=0.0001) List[ndarray] ¶
Returns unique elements of a list of np.arrays, within a tolerance.