Agnostic Architecture for Heterogeneous Multi-Environment Reinforcement Learning
Changhee Joo
NeurIPS 2023, Foundation Models for Decision Making, 2023
In new environments, training a Reinforcement Learning (RL) agent from scratch can prove to be inefficient. The computational and temporal costs can be significantly reduced if the agent can learn across diverse environments and effectively perform transfer learning. However, achieving learning across multiple environments is challenging due to the varying state and action spaces inherent in different RL problems. Padding or naive parameter-sharing with environment-specific layers for different state-action spaces are possible solutions for multi-environment training. However, they can be less scalable when training for new environments. In this work, we present a flexible and environment-agnostic architecture designed for learning across multiple environments simultaneously without padding or environment-specific layers, while enabling transfer learning for new environments. We also propose training algorithms for this architecture to enable both online and offline RL. Our experiments demonstrate that multi-environment training with one agent is possible in heterogeneous environments and parameter-sharing with environment-specific layers is not effective in transfer learning.