L2RAVE

About L2RAVE Project

The growing complexity of cyber systems has made them difficult for human operators to defend, particularly in the presence of intelligent and resourceful adversaries who target multiple system components simultaneously, employ previously unobserved attack vectors, and use stealth and deception to evade detection. There is a need for developing autonomic cyber systems that can integrate statistical learning and rules-based formal reasoning to provide an adaptive and robust situational awareness and resilient system response. In this collaborative research effort, we propose to develop a feedback-driven Learn to Reason (L2R) framework, which aims to integrate statistical learning with formal reasoning, in adversarial environments. Our insight is that in order to realize the potential benefits of L2R, continuous interaction between the statistical and formal components is needed, both at intermediate time steps and at multiple layers of abstraction.

Meet the Team

Principal Investigator

Prof. Radha Poovendran

NSL Founding Director
Professor, Electrical and Computer Engineering
Adjunct Professor, Aeronautics & Astronautics

Co-Principal Investigator

Prof. Linda Bushnell

Research Professor
Department of Electrical and Computer Engineering, University of Washington

Co-Principal Investigator

Prof. Hannaneh Hajishirzi

Assistant Professor
Allen School of Computer Science and Engineering University of Washington

Professor

Prof. Andrew Clark

Assistant Professor
Electrical and Computer Engineering, Worcester Polytechnic Institute

Post-Doc

Bhaskar Ramasubramanian

Post Doctoral Scholar
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Post-Doc

Arezoo Rajabi

Post Doctoral Scholar
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Student

Baicen Xiao

Ph.D student (post-quals)
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Student

Shruti Misra

Ph.D student (pre-quals)
Network Security Lab,
Electrical and Computer Engineering,
University of Washington

Student

Luyao Niu

Ph.D student
Electrical and Computer Engineering,
Worcester Polytechnic Institute

Student

Zhouchi Li

Ph.D. student
Electrical and Computer Engineering, Worcester Polytechnic Institute

RECENT PUBLICATIONS

A. Rajabi, B. Ramasubramanian, A. A. Maruf, R. Poovendran, "Privacy-Preserving Reinforcement Learning Beyond Expectation," IEEE Conference on Decision and Control (CDC), Dec 2022.

A. Rajabi, B. Ramasubramanian, R. Poovendran, "Trojan Horse Training to Break Defenses Against Backdoor Attacks," arXiv Preprint, 25 March 2022.

B. Xiao, B. Ramasubramanian, R. Poovendran, "Shaping Advice in Deep Reinforcement Learning", arXiv Preprint, 22 Feb, 2022.

B. Xiao, B. Ramasubramanian, R. Poovendran, "Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning", in Proc. of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022), Online, May 2022.

B. Ramasubramanian, L. Niu, A. Clark, L. Bushnell, R. Poovendran, "Secure Control in Partially Observable Environments to Satisfy LTL Specifications", IEEE Transactions on Automatic Control, December 2021.

B. Ramasubramanian, L. Niu, A. Clark, R. Poovendran, "Reinforcement Learning Beyond Expectation," in IEEE Conference on Decision and Control (CDC), 2021.

A. Clark, "Verification and Synthesis of Control Barrier Functions," in IEEE Conference on Decision and Control (CDC), 2021.

L. Niu, H. Zhang, and A. Clark,"Safety-Critical Control Synthesis for Unknown Sample-Data Systems via Control Barrier Functions," in IEEE Conference on Decision and Control (CDC), 2021.

B. Xiao, B. Ramasubramanian, R. Poovendran, "Shaping Advice in Deep Multi-Agent Reinforcement Learning", (29 Mar 2021).

A. Clark, "Control Barrier Functions for Stochastic Systems", Provisionally accepted to Automatica, 2021.

L. Niu and A. Clark, "Control Barrier Functions for Abstraction-Free Control Synthesis under Temporal Logic Constraints", in IEEE Conference on Decision and Control (CDC), 2020.

A. Clark, Z. Li, and H. Zhang, "Control Barrier Functions for Safe CPS Under Sensor Faults and Attacks", in IEEE Conference on Decision and Control (CDC), 2020.

B. Ramasubramanian, L. Niu, A. Clark, L. Bushnell, R. Poovendran, "Privacy-Preserving Resilience of Cyber-physical Systems to Adversaries", in IEEE Conference on Decision and Control (CDC), 2020.

B. Ramasubramanian, B. Xiao, L. Bushnell, R. Poovendran, "Safety-Critical Online Control with Adversarial Disturbances", in IEEE Conference on Decision and Control (CDC), 2020.

B. Xiao, Q. Lu, B. Ramasubramanian, A. Clark, L. Bushnell, R. Poovendran, "FRESH: Interactive Reward Shaping in High-dimensional State Spaces Using Human Feedback", International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2020.

L. Niu, B. Ramasubramanian, A. Clark, L. Bushnell, and R. Poovendran, "Control Synthesis for Cyber-Physical Systems to Satisfy Metric Interval Temporal Logic Objectives under Timing and Actuator Attacks", In Proceedings of 11th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), Sydney, Australia, 2020.

A. Clark, "A Submodular Optimization Approach to the Metric Traveling Salesman Problem with Neighborhoods." In IEEE Conference on Decision and Control (CDC), December 2019.

A. Clark and Z. Li, "Resilient Trajectory Planning in Adversarial Environments." In IEEE Conference on Decision and Control (CDC), December 2019.

Q. Hou and A. Clark, "Robust Maximization of Correlated Submodular Functions." In IEEE Conference on Decision and Control (CDC), December 2019.

L. Niu and A. Clark, "A Framework for Joint Attack Detection and Control Under False Data Injection." In International Conference on Decision and Game Theory for Security (GameSec), 2019.

B. Xiao, B. Ramasubramanian, A. Clark, H. Hajishirzi, L. Bushnell, R. Poovendran, "Potential-Based Advice for Stochastic Policy Learning", In IEEE Conference on Decision and Control (CDC), December 2019.

B. Ramasubramanian, L. Niu, A. Clark, L. Bushnell, R. Poovendran, "Linear Temporal Logic Satisfaction in Adversarial Environments using Secure Control Barrier Certificates", To appear in International Conference on Decision and Game Theory for Security, Lecture Notes in Computer Science, Springer, 2019

Bhaskar Ramasubramanian, Andrew Clark, Linda Bushnell, and Radha Poovendran, "Secure Control Under Partial Observability with Temporal Logic Constraints", In American Control Conference (ACC), Philadelphia, PA, July 10-12, 2019

Software and Code

Agent-Temporal Attention for Reward Redistribution in Episodic MultiAgent Reinforcement Learning

Date: Tue January 18, 2022

Summary

This paper considers multi-agent reinforcement learning (MARL) tasks where agents receive a shared global reward at the end of an episode. The delayed nature of this reward affects the ability of the agents to assess the quality of their actions at intermediate time-steps. This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal. Solving such MARL problems requires addressing two challenges: identifying (1) relative importance of states along the length of an episode (along time), and (2) relative importance of individual agents’ states at any single time-step (among agents). In this paper, we introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges. AREL uses attention mechanisms to characterize the influence of actions on state transitions along trajectories (temporal attention), and how each agent is affected by other agents at each time-step (agent attention). The redistributed rewards predicted by AREL are dense, and can be integrated with any given MARL algorithm.

Shaping Advice in Deep Multi-Agent Reinforcement Learning

Date: Wed March 28, 2021

Summary

Multi-agent reinforcement learning involves multiple agents interacting with each other and a shared environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose a method called Shaping Advice in deep Multi-agent reinforcement learning (SAM) to augment the reward signal from the environment with an additional reward termed shaping advice. The shaping advice is given by a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The shaping advice needs to be specified only once at the start of training, and can be easily provided by non-experts. We show through theoretical analyses and experimental validation that shaping advice provided by SAM does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that convergence of policy gradients and value functions when using SAM implies convergence of these quantities in the absence of SAM. Experimentally, we evaluate SAM on three tasks in the multi-agent Particle World environment that have sparse rewards. We observe that using SAM results in agents learning policies to complete tasks faster, and obtain higher rewards than: i) using sparse rewards alone; ii) a state-of-the-art reward redistribution method, Iterative Relative Credit Refinement (IRCR).

ONR Project: L2RAVE

About L2RAVE Project

Meet the Team

Principal Investigator

Prof. Radha Poovendran

Co-Principal Investigator

Prof. Linda Bushnell

Co-Principal Investigator

Prof. Hannaneh Hajishirzi

Professor

Prof. Andrew Clark

Post-Doc

Bhaskar Ramasubramanian

Post-Doc

Arezoo Rajabi

Student

Baicen Xiao

Student

Shruti Misra

Student

Luyao Niu

Student

Zhouchi Li

Software and Code

Agent-Temporal Attention for Reward Redistribution in Episodic MultiAgent Reinforcement Learning

Shaping Advice in Deep Multi-Agent Reinforcement Learning