We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes n samples of a d-dimensional parameter vector , multiplied by a random sign , and corrupted by isotropic standard Gaussian noise. The sequence of signs is drawn from a stationary homogeneous Markov chain with flip probability . As varies, this model smoothly interpolates two well-studied models: the Gaussian Location Model for which and the Gaussian Mixture Model for which . Assuming that the estimator knows , we establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of , , , . We then provide an upper bound to the case of estimating , assuming a (possibly inaccurate) knowledge of . The bound is proved to be tight when is an accurately known constant. These results are then combined to an algorithm which estimates with unknown a priori, and theoretical guarantees on its error are stated.
Based on joint work (arXiv:2206.02455) with Nir Weinberger.
Bio: Yihan Zhang received the B.Eng. degree in computer science and technology from Northeastern University, Shenyang, China, in June 2016, and the Ph.D. degree from the Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, in August 2020. He was a Post-Doctoral Researcher at the Henry and Marilyn Taub Faculty of Computer Science, Technion–Israel Institute of Technology, from October 2020 to October 2021. He has been a Post-Doctoral Researcher at the Institute of Science and Technology Austria since October 2021. His research interests include coding theory, information theory, and statistics theory (in no particular order).