RewardΒΆ
The default reward function is the change in cumulative vehicle delay:
That is, the reward is how much the total delay (sum of the waiting times of all approaching vehicles) changed in relation to the previous time-step.
You can choose a different reward function (see the ones implemented in TrafficSignal) with the parameter reward_fn
in the SumoEnvironment constructor.
It is also possible to implement your own reward function:
def my_reward_fn(traffic_signal):
return traffic_signal.get_average_speed()
env = SumoEnvironment(..., reward_fn=my_reward_fn)