Research & Publications Office to host research seminar on ‘Sequential learning in a stochastic multi-armed bandit framework’ on 15th June
The talk will be delivered by Prof. Sandeep Juneja, Tata Institute of Fundamental Research
5 June, 2023, Bengaluru: The Office of Research and Publications (R&P) at IIM Bangalore will host a research seminar on ‘Sequential learning in a stochastic multi-armed bandit framework’, to be led by Prof. Sandeep Juneja, Tata Institute of Fundamental Research (Decision Sciences area), at 2:30 pm on 15th June 2023 (Thursday), at Classroom K21.
Abstract: The classic stochastic multi armed bandit framework involves finitely many unknown probability distributions that can be sequentially sampled to generate independent rewards. The talk will cover two foundational problems: the first one corresponds to sampling to minimize the expected regret, or equivalently, to maximize the expected total reward. The second one corresponds to the best arm identification, that is, identifying the arm with the largest mean, or any other performance measure, using as few samples as possible while providing explicit probabilistically correct selection guarantees. These problems form the bedrock of algorithms used in web design and advertising, recommendation systems, clinical trials and many other exciting applications.
The talk will see some of the popular algorithms used for these problems emphasizing the intuition underlying the elegant ideas, being reviewed. Technically speaking, these problems have been well studied under the restrictive assumption that arm distributions belong to a single parameter exponential family, that includes distributions such as Bernoulli and Gaussian with known variance. Under these settings, lower bounds on samples needed are developed using ideas from hypothesis testing, and algorithms are proposed that match the lower bound. Optimal algorithms are proposed, that match the lower bounds even to a constant for general probability distributions under minimal restrictions. Further enhancements in the presence of offline data that needs to be combined with online data, would be discussed. A new algorithm would also be proposed, in the best arm identification setting that along with minimizing sample complexity, is also computationally efficient. Mean field analysis is conducted, that insightfully explains the algorithm behavior.
Speaker Profile: Dr. Sandeep Juneja is a Senior Professor at the School of Technology and Computer Science, Tata Institute of Fundamental Research, in Mumbai. He received his B Tech in Mechanical Engineering from IIT Delhi and his MS in Statistics and PhD in Operations Research from Stanford University. Thereafter, he worked for American Credit Indemnity, Baltimore in USA followed by a one-year stint in Andersen Consulting in India. He was a faculty in the Operations Research Group in the Department of Mechanical Engineering at IIT Delhi, after which he has been with TIFR. He has held visiting positions in Columbia University, Stanford University, Indian School of Business, etc. He also headed the quantitative activity in Bank of America’s Indian operations. He was a member of the Bank of America’s Executive Quantitative Council. He is currently on the editorial Board of Stochastic Systems. He is a recipient of IBM faculty partnership award in the year 2001-02 and he co-authored papers that received best paper awards at 4th and 6th International ICST Conference on Performance Evaluation Methodologies and Tools. He was an adjunct at the Centre for Advanced Financial Research and Learning (CAFRAL), a research wing of Reserve Bank of India. Currently, he is visiting the Machine Learning and Optimization Group at Google Research in Bangalore.
His research interests lie in applied probability including in sequential learning, mathematical finance, Monte Carlo methods, and game theoretic analysis of queues. Lately, he has been involved in modelling Covid-19 spread in Mumbai, and in mathematics of certain epidemiological models.
Webpage Link: https://www.tcs.tifr.res.in/~sandeepj/
Research & Publications Office to host research seminar on ‘Sequential learning in a stochastic multi-armed bandit framework’ on 15th June
The talk will be delivered by Prof. Sandeep Juneja, Tata Institute of Fundamental Research
5 June, 2023, Bengaluru: The Office of Research and Publications (R&P) at IIM Bangalore will host a research seminar on ‘Sequential learning in a stochastic multi-armed bandit framework’, to be led by Prof. Sandeep Juneja, Tata Institute of Fundamental Research (Decision Sciences area), at 2:30 pm on 15th June 2023 (Thursday), at Classroom K21.
Abstract: The classic stochastic multi armed bandit framework involves finitely many unknown probability distributions that can be sequentially sampled to generate independent rewards. The talk will cover two foundational problems: the first one corresponds to sampling to minimize the expected regret, or equivalently, to maximize the expected total reward. The second one corresponds to the best arm identification, that is, identifying the arm with the largest mean, or any other performance measure, using as few samples as possible while providing explicit probabilistically correct selection guarantees. These problems form the bedrock of algorithms used in web design and advertising, recommendation systems, clinical trials and many other exciting applications.
The talk will see some of the popular algorithms used for these problems emphasizing the intuition underlying the elegant ideas, being reviewed. Technically speaking, these problems have been well studied under the restrictive assumption that arm distributions belong to a single parameter exponential family, that includes distributions such as Bernoulli and Gaussian with known variance. Under these settings, lower bounds on samples needed are developed using ideas from hypothesis testing, and algorithms are proposed that match the lower bound. Optimal algorithms are proposed, that match the lower bounds even to a constant for general probability distributions under minimal restrictions. Further enhancements in the presence of offline data that needs to be combined with online data, would be discussed. A new algorithm would also be proposed, in the best arm identification setting that along with minimizing sample complexity, is also computationally efficient. Mean field analysis is conducted, that insightfully explains the algorithm behavior.
Speaker Profile: Dr. Sandeep Juneja is a Senior Professor at the School of Technology and Computer Science, Tata Institute of Fundamental Research, in Mumbai. He received his B Tech in Mechanical Engineering from IIT Delhi and his MS in Statistics and PhD in Operations Research from Stanford University. Thereafter, he worked for American Credit Indemnity, Baltimore in USA followed by a one-year stint in Andersen Consulting in India. He was a faculty in the Operations Research Group in the Department of Mechanical Engineering at IIT Delhi, after which he has been with TIFR. He has held visiting positions in Columbia University, Stanford University, Indian School of Business, etc. He also headed the quantitative activity in Bank of America’s Indian operations. He was a member of the Bank of America’s Executive Quantitative Council. He is currently on the editorial Board of Stochastic Systems. He is a recipient of IBM faculty partnership award in the year 2001-02 and he co-authored papers that received best paper awards at 4th and 6th International ICST Conference on Performance Evaluation Methodologies and Tools. He was an adjunct at the Centre for Advanced Financial Research and Learning (CAFRAL), a research wing of Reserve Bank of India. Currently, he is visiting the Machine Learning and Optimization Group at Google Research in Bangalore.
His research interests lie in applied probability including in sequential learning, mathematical finance, Monte Carlo methods, and game theoretic analysis of queues. Lately, he has been involved in modelling Covid-19 spread in Mumbai, and in mathematics of certain epidemiological models.
Webpage Link: https://www.tcs.tifr.res.in/~sandeepj/