Abstract: Deep Reinforcement Learning (DRL) methods are inefficient in the initial strategy exploration process due to the huge state space and action space in large-scale complex scenarios. This is ...