Research & Publications Office to host seminar on ‘Decentralised Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms’ on 6th December

The talk will be delivered by Prof. N. Hemachandra, IIT Bombay
30 November, 2022, Bengaluru: The Office of Research and Publications (R&P) at IIM Bangalore will host a seminar titled: ‘Decentralised Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms’ at 4:00 pm on 6th December 2022 (Tuesday), at Classroom K21. The talk will be delivered by Prof. N. Hemachandra, IIT Bombay, from the Production & Operations Management area.
Abstract: Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning (RL) paradigm. The research proposes three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some information to their neighbors via a time varying communication network. The research proves convergence of all the three MAN algorithms to a globally asymptotically stable set of the ODE corresponding to actor update; these use linear function approximations. It also shows that the Kullback–Leibler divergence between policies of successive iterates is proportional to the objective function’s gradient. It is observed that the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension. Using this, the researchers theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC) algorithm. There is a possibility that it is the first such result in multi-agent reinforcement learning (MARL). To illustrate the usefulness of the proposed algorithms, the researchers implement these on a bi-lane traffic network to reduce the average network congestion. They observed an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is at par with the MAAC algorithm.
About the speaker: Dr. N. Hemachandra is a Professor of Industrial Engineering and Operations Research, IIT Bombay. His academic interests include Operations Research and Machine Learning, including RL and Bandit problems and their applications to problems arising from Supply Chains, Communication Networks, Logistics and Financial Engineering. He has an M Tech (Control Systems Engg., EE) from IIT Kharagpur and PhD from the Dept. of Computer Science and Automation, IISc.
Webpage link: https://www.ieor.iitb.ac.in/~nh
Research & Publications Office to host seminar on ‘Decentralised Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms’ on 6th December
The talk will be delivered by Prof. N. Hemachandra, IIT Bombay
30 November, 2022, Bengaluru: The Office of Research and Publications (R&P) at IIM Bangalore will host a seminar titled: ‘Decentralised Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms’ at 4:00 pm on 6th December 2022 (Tuesday), at Classroom K21. The talk will be delivered by Prof. N. Hemachandra, IIT Bombay, from the Production & Operations Management area.
Abstract: Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning (RL) paradigm. The research proposes three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some information to their neighbors via a time varying communication network. The research proves convergence of all the three MAN algorithms to a globally asymptotically stable set of the ODE corresponding to actor update; these use linear function approximations. It also shows that the Kullback–Leibler divergence between policies of successive iterates is proportional to the objective function’s gradient. It is observed that the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension. Using this, the researchers theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC) algorithm. There is a possibility that it is the first such result in multi-agent reinforcement learning (MARL). To illustrate the usefulness of the proposed algorithms, the researchers implement these on a bi-lane traffic network to reduce the average network congestion. They observed an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is at par with the MAAC algorithm.
About the speaker: Dr. N. Hemachandra is a Professor of Industrial Engineering and Operations Research, IIT Bombay. His academic interests include Operations Research and Machine Learning, including RL and Bandit problems and their applications to problems arising from Supply Chains, Communication Networks, Logistics and Financial Engineering. He has an M Tech (Control Systems Engg., EE) from IIT Kharagpur and PhD from the Dept. of Computer Science and Automation, IISc.
Webpage link: https://www.ieor.iitb.ac.in/~nh