A plot of the average reward achieved by an agent using the duality-based approach (blue) compared to an agent using standard actor-critic (orange). In addition to being more mathematically principled, our approach also yields better practical results. |