On Learning Intrinsic Rewards for Policy Gradient Methods

On Learning Intrinsic Rewards for Policy Gradient Methods