Transfer Learning of Control Policies

Planning and Autonomous Control | Transfer Learning of Control Policies

What is the objective?  
To develop a transfer learning approach that can transfer decision-making information from a surrogate simulation environment to a target environment. This also includes the development of similarity metrics by which a user can determine whether transfer between two environments is feasible or useful. 

What problem are we trying to solve?  
Current simulation environments are too slow for the adoption of state-of-the art machine learning approaches to decision-making, such as those proposed in the areas of reinforcement learning and planning. Recent noteworthy examples include Monte-Carlo Tree Search and actor-critic architectures which have been used to yield superhuman performance in games of precision and perception. These approaches require tens of millions of simulation runs or tens of thousands of years’ worth of simulation data. As a point of reference, Air Force simulation environments such as Advanced Framework for Simulation, Integration and Modeling (AFSIM) and Air Warfare Simulation, Integration and Modeling (AWSIM) or other complex operational environments execute an individual simulation on the order of minutes and hours, respectively. These runtimes preclude the adoption of such data-hungry methods. However, the tremendous success of transfer learning in image classification and, more recently, natural language processing gives us hope that transferring learned information may be feasible from a surrogate simulation environment to our existing slow simulators (e.g. AFSIM, AWSIM). This remains largely an open problem in the foregoing fields of reinforcement learning and planning. While this motivates the foregoing from a military perspective, prospective performers may choose non-military surrogate and target environments in developing their transfer learning approach and transfer similarity metrics. 

What outcome do we hope to achieve?  
To adopt and adapt state-of-the-art Machine Learning capabilities in a surrogate simulation environment such that the outcomes of these capabilities (e.g. decision-making information) can be transferred to a target simulation environment. 

What resources could the lab provide?  
The lab can provide relevant target environments such as AFSIM and AWSIM as well as identify surrogate environments, subject-matter expertise, and evaluation criteria by which to evaluate the effectiveness of the proposed transfer approach. 

What would success look like?  
Performance in the surrogate and target simulation environments after transferring learned decision-making information from the former should reflect the similarity metrics developed. That is, low similarity should lead to low or unpredictable performance when transferring from the surrogate to the target. Conversely, high similarity should yield high performance on the target. The performance metric is dependent on the environments chosen and can include things like maximizing a reward signal, yielding explainable actions, or establishing robust control policies, among others. 

What types of solutions would we expect?  
From the simulation environment perspective, we expect one of two possible solution types. First, the industry/academic partners develop their own fast surrogate environment and develop or choose some other target environment. Second, the industry/academic partners leverage an existing simulation environment (e.g. StarCraft 2, RAND’s AFGYM [1], etc.) as the surrogate. From the perspective of transfer learning solutions, potential candidates include, but are not limited to: the development of state/trajectory/policy similarity metrics; environmental (e.g. state,state-action) embeddings to facilitate transfer learning; alternative environment or objective representations (e.g. the representation of the objective as automaton or some state abstraction techniques); many other potential solutions are plausible. 

What's in it for industry?  
There will be a cooperative research agreement where both the industry/academic team and the government will benefit from the solution. The technologies developed have large applicability to both civilian and military applications. This effort can be leveraged as part of a future greater submission to an Air Force program or DoD effort.

The Request for Partnership Submission Period Has Now Ended.

[1] Zhang, Li A., et al. Air Dominance Through Machine Learning: A Preliminary Exploration of Artificial Intelligence-Assisted Mission Planning. RAND Corporation Santa Monica, 2020.