OverviewΒΆ

MORL-Baselines contains multiple implementations of multi-objective reinforcement learning algorithms. The following table lists the algorithms that are currently implemented in MORL-Baselines.

Name

Single/Multi-policy

ESR/SER

Observation space

Action space

Paper

GPI-LS + GPI-PD

Multi

SER

Continuous

Discrete / Continuous

Paper and Supplementary Materials

MORL/D

Multi

/

/

/

Paper

Envelope Q-Learning

Multi

SER

Continuous

Discrete

Paper

CAPQL

Multi

SER

Continuous

Continuous

Paper

PGMORL 1

Multi

SER

Continuous

Continuous

Paper / Supplementary Materials

Pareto Conditioned Networks (PCN)

Multi

SER/ESR 2

Continuous

Discrete / Continuous

Paper

Pareto Q-Learning

Multi

SER

Discrete

Discrete

Paper

MO Q learning

Single

SER

Discrete

Discrete

Paper

MPMOQLearning (outer loop MOQL)

Multi

SER

Discrete

Discrete

Paper

Optimistic Linear Support (OLS)

Multi

SER

/

/

Section 3.3 of the thesis

Expected Utility Policy Gradient (EUPG)

Single

ESR

Discrete

Discrete

Paper

:warning: Some of the algorithms have limited features.

1: Currently, PGMORL is limited to environments with 2 objectives.

2: PCN assumes environments with deterministic transitions.