ONR Project: L2RAVE

Feedback-Driven Learn to Reason in Adversarial Environments for Autonomic Cyber Systems

About L2RAVE Project

The growing complexity of cyber systems has made them difficult for human operators to defend, particularly in the presence of intelligent and resourceful adversaries who target multiple system components simultaneously, employ previously unobserved attack vectors, and use stealth and deception to evade detection. There is a need for developing autonomic cyber systems that can integrate statistical learning and rules-based formal reasoning to provide an adaptive and robust situational awareness and resilient system response. In this collaborative research effort, we propose to develop a feedback-driven Learn to Reason (L2R) framework, which aims to integrate statistical learning with formal reasoning, in adversarial environments. Our insight is that in order to realize the potential benefits of L2R, continuous interaction between the statistical and formal components is needed, both at intermediate time steps and at multiple layers of abstraction.



L2RAVE PROJECT SPONSOR

Meet the Team

Principal Investigator

Prof. Radha Poovendran

NSL Founding Director
Professor, Electrical and Computer Engineering
Adjunct Professor, Aeronautics & Astronautics

Co-Principal Investigator

Prof. Linda Bushnell

Research Professor
Department of Electrical and Computer Engineering, University of Washington

Co-Principal Investigator

Prof. Hannaneh Hajishirzi

Assistant Professor
Allen School of Computer Science and Engineering University of Washington

Professor

Prof. Andrew Clark

Assistant Professor
Electrical and Computer Engineering, Worcester Polytechnic Institute

Post-Doc

Bhaskar Ramasubramanian

Post Doctoral Scholar
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Post-Doc

Arezoo Rajabi

Post Doctoral Scholar
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Student

Baicen Xiao

Ph.D student (post-quals)
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Student

Shruti Misra

Ph.D student (pre-quals)
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Student

Luyao Niu

Ph.D student
Electrical and Computer Engineering,
Worcester Polytechnic Institute

Student

Zhouchi Li

Ph.D. student
Electrical and Computer Engineering, Worcester Polytechnic Institute

Software and Code

Agent-Temporal Attention for Reward Redistribution in Episodic MultiAgent Reinforcement Learning

Date: Tue January 18, 2022

Summary

This paper considers multi-agent reinforcement learning (MARL) tasks where agents receive a shared global reward at the end of an episode. The delayed nature of this reward affects the ability of the agents to assess the quality of their actions at intermediate time-steps. This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal. Solving such MARL problems requires addressing two challenges: identifying (1) relative importance of states along the length of an episode (along time), and (2) relative importance of individual agents’ states at any single time-step (among agents). In this paper, we introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges. AREL uses attention mechanisms to characterize the influence of actions on state transitions along trajectories (temporal attention), and how each agent is affected by other agents at each time-step (agent attention). The redistributed rewards predicted by AREL are dense, and can be integrated with any given MARL algorithm.

Read more...

Github Link: https://github.com/baicenxiao/arel

Code description

This code presents a Python implementation of the AREL algorithm from the paper:
Agent-Temporal Attention for Reward Redistribution in Episodic MultiAgent Reinforcement Learning

• Python version - 3.7
• Python libraries required - OpenAI gym (0.10.9), Pytorch (1.6.0)

For additional information, contact: Baicen Xiao, email: bcxiao@uw.edu



Shaping Advice in Deep Multi-Agent Reinforcement Learning

Date: Wed March 28, 2021

Summary

Multi-agent reinforcement learning involves multiple agents interacting with each other and a shared environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose a method called Shaping Advice in deep Multi-agent reinforcement learning (SAM) to augment the reward signal from the environment with an additional reward termed shaping advice. The shaping advice is given by a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The shaping advice needs to be specified only once at the start of training, and can be easily provided by non-experts. We show through theoretical analyses and experimental validation that shaping advice provided by SAM does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that convergence of policy gradients and value functions when using SAM implies convergence of these quantities in the absence of SAM. Experimentally, we evaluate SAM on three tasks in the multi-agent Particle World environment that have sparse rewards. We observe that using SAM results in agents learning policies to complete tasks faster, and obtain higher rewards than: i) using sparse rewards alone; ii) a state-of-the-art reward redistribution method, Iterative Relative Credit Refinement (IRCR).

Read more...

Github Link: https://github.com/baicenxiao/SAM

Code description

This code presents a Python implementation of the SAM algorithm from the paper:
Shaping Advice in Deep Multi-Agent Reinforcement Learning
It is configured to be run in conjunction with multi-agent reinforcement learning environments from the Multi-Agent Particle Environments (MPE). Different from the original MPE environment where rewards were dense, our work uses a sparse reward structure. Note: This code base has been restructured compared to the original paper, and some results may be different.

• Python version - 3.5.4
• Python libraries required - OpenAI gym (0.10.5), TensorFlow (1.9.0), numpy (1.15.2)