Sumo Environment¶

class sumo_rl.environment.env.SumoEnvironment(net_file: str, route_file: str, out_csv_name: str | None = None, use_gui: bool = False, virtual_display: ~typing.Tuple[int, int] = (3200, 1800), begin_time: int = 0, num_seconds: int = 20000, max_depart_delay: int = -1, waiting_time_memory: int = 1000, time_to_teleport: int = -1, delta_time: int = 5, yellow_time: int = 2, min_green: int = 5, max_green: int = 50, enforce_max_green: bool = False, single_agent: bool = False, reward_fn: str | ~typing.Callable | dict | ~typing.List = 'diff-waiting-time', reward_weights: ~typing.List[float] | None = None, observation_class: type[~sumo_rl.environment.observations.ObservationFunction] = <class 'sumo_rl.environment.observations.DefaultObservationFunction'>, add_system_info: bool = True, add_per_agent_info: bool = True, sumo_seed: str | int = 'random', fixed_ts: bool = False, sumo_warnings: bool = True, additional_sumo_cmd: str | None = None, render_mode: str | None = None)¶

SUMO Environment for Traffic Signal Control.

Class that implements a gym.Env interface for traffic signal control using the SUMO simulator. See https://sumo.dlr.de/docs/ for details on SUMO. See https://gymnasium.farama.org/ for details on gymnasium.

Parameters:

net_file (str) – SUMO .net.xml file
route_file (str) – SUMO .rou.xml file
out_csv_name (Optional[str]) – name of the .csv output with simulation results. If None, no output is generated
use_gui (bool) – Whether to run SUMO simulation with the SUMO GUI
virtual_display (Optional[Tuple[int,int]]) – Resolution of the virtual display for rendering
begin_time (int) – The time step (in seconds) the simulation starts. Default: 0
num_seconds (int) – Number of simulated seconds on SUMO. The duration in seconds of the simulation. Default: 20000
max_depart_delay (int) – Vehicles are discarded if they could not be inserted after max_depart_delay seconds. Default: -1 (no delay)
waiting_time_memory (int) – Number of seconds to remember the waiting time of a vehicle (see https://sumo.dlr.de/pydoc/traci._vehicle.html#VehicleDomain-getAccumulatedWaitingTime). Default: 1000
time_to_teleport (int) – Time in seconds to teleport a vehicle to the end of the edge if it is stuck. Default: -1 (no teleport)
delta_time (int) – Simulation seconds between actions. Default: 5 seconds
yellow_time (int) – Duration of the yellow phase. Default: 2 seconds
min_green (int) – Minimum green time in a phase. Default: 5 seconds
max_green (int) – Max green time in a phase. Default: 60 seconds. Warning: This parameter is currently ignored!
enforce_max_green (bool) – If true, it enforces the max green time and selects the next green phase when the max green time is reached. Default: False
single_agent (bool) – If true, it behaves like a regular gym.Env. Else, it behaves like a MultiagentEnv (returns dict of observations, rewards, dones, infos).
reward_fn (str/function/dict/List) – String with the name of the reward function used by the agents, a reward function, dictionary with reward functions assigned to individual traffic lights by their keys, or a List of reward functions.
reward_weights (List[float]/np.ndarray) – Weights for linearly combining the reward functions, in case reward_fn is a list. If it is None, the reward returned will be a np.ndarray. Default: None
observation_class (ObservationFunction) – Inherited class which has both the observation function and observation space.
add_system_info (bool) – If true, it computes system metrics (total queue, total waiting time, average speed) in the info dictionary.
add_per_agent_info (bool) – If true, it computes per-agent (per-traffic signal) metrics (average accumulated waiting time, average queue) in the info dictionary.
sumo_seed (int/string) – Random seed for sumo. If ‘random’ it uses a randomly chosen seed.
fixed_ts (bool) – If true, it will follow the phase configuration in the route_file and ignore the actions given in the step() method.
sumo_warnings (bool) – If true, it will print SUMO warnings.
additional_sumo_cmd (str) – Additional SUMO command line arguments.
render_mode (str) – Mode of rendering. Can be ‘human’ or ‘rgb_array’. Default: None

property action_space¶

Return the action space of a traffic signal.

Only used in case of single-agent environment.

action_spaces(ts_id: str) → Discrete¶: Return the action space of a traffic signal.

close()¶: Close the environment and stop the SUMO simulation.

encode(state, ts_id)¶: Encode the state of the traffic signal into a hashable object.

property observation_space¶

Return the observation space of a traffic signal.

Only used in case of single-agent environment.

observation_spaces(ts_id: str)¶: Return the observation space of a traffic signal.

render()¶

Render the environment.

If render_mode is “human”, the environment will be rendered in a GUI window using pyvirtualdisplay.

reset(seed: int | None = None, **kwargs)¶: Reset the environment.

property reward_dim¶

Return the reward dimension of a traffic signal.

Only used in case of single-agent environment.

property reward_space¶

Return the reward space of a traffic signal.

Only used in case of single-agent environment.

save_csv(out_csv_name, episode)¶

Save metrics of the simulation to a .csv file.

Parameters:

out_csv_name (str) – Path to the output .csv file. E.g.: “results/my_results
episode (int) – Episode number to be appended to the output file name.

property sim_step: float¶: Return current simulation second on SUMO.

step(action: dict | int)¶

Apply the action(s) and then step the simulation for delta_time seconds.

Parameters:

action (Union[dict, int]) – action(s) to be applied to the environment.
True (If single_agent is)
int (action is an)
ids. (otherwise it expects a dict with keys corresponding to traffic signal)