Communication, Control and Signal Processing Seminar 

Abstracts of talks

Fall 2017

09/07 Karim Banawan (UMD), The Capacity of Private Information Retrieval from MDS-Coded Databases

ABSTRACT: The problem of preserving the privacy of the contents downloaded from open-access databases has been a major area of research within the computer science community. Many practical applications are related to the private retrieval problem, such as: ensuring privacy of investors upon downloading records in a stock market, and ensuring the privacy of activists against authoritarian regimes while browsing restricted contents from the internet. In the classical private information retrieval (PIR) problem, a user wishes to retrieve a certain message (or file) out of $M$ distinct messages from $N$ non-colluding and replicated databases without leaking any information about the identity of the desired message. A straightforward solution for the PIR problem is for the user to download the entire database. This solution, however, is highly inefficient in terms of the PIR rate, which is the ratio between the desired message size and the total downloaded symbols. Recently, the PIR problem is re-formulated by information theorists. This formulation gives rise to the PIR capacity notion, which is the supremum of PIR rates over all achievable retrieval schemes.

In this talk, we briefly introduce the PIR problem from the information-theoretic perspective, then we briefly review some of the results concerned about PIR from replicated databases. The main focus of the talk is the capacity of PIR from MDS-coded databases. In this work, we consider the problem of PIR over a distributed storage system. The storage system consists of $N$ non-colluding databases, each storing an MDS-coded version of $M$ messages. In the PIR problem, the user wishes to retrieve one of the available messages without revealing the message identity to any individual database. We derive the information-theoretic capacity of this problem to be $C=\left(1+\frac{K}{N}+\frac{K^2}{N^2}+\cdots+\frac{K^{M-1}}{N^{M-1}}\right)^{-1}=(1+R_c+R_c^2+\cdots+R_c^{M-1})^{-1}=\frac{1-R_c}{1-R_c^M}$, where $R_c$ is the rate of the $(N,K)$ code used. The capacity is a function of the code rate and the number of messages only regardless of the explicit structure of the storage code. The result implies a fundamental tradeoff between the optimal retrieval cost and the storage cost. The result generalizes the achievability and converse results for the classical PIR with replicating databases to the case of coded databases.

09/12 Itzhak Tamo (Tel Aviv University), Optimal repair of polynomials: Achieving the cut-set bound

ABSTRACT: It is a well known fact that any k evaluation points of a polynomial P(x) of degree less than k suffice to calculate (recover) the evaluation at any other point. Motivated by applications to error correcting codes for storage systems, we consider the following question. Is it possible to recover P(\alpha) for some \alpha, by observing some partial information of d\geq kevaluation points P(\alpha_i) ? More precisely, the observed partial information is f_i(P(\alpha_i)) for some function f_i which is not necessarily injective, and the goal is to perform the recovery, while minimizing the total amount of observed partial information.

In this talk, building on the work of Guruswami and Wootters, we will present a repair scheme that uses the smallest possible partial information for a given d​. The scheme relies on a specially picked set of evaluation points which generates a field extension (over a finite field) with a specific lattice of subfields. If time permits I will discuss other aspects of this question and repair schemes that rely on different methods than polynomial evaluations.

Joint work with Alexander Barg and Min Ye.

09/21 Ahmed Arafa (UMD), Age-Minimal Transmission in Energy Harvesting Two-hop Networks

ABSTRACT: We consider an energy harvesting two-hop network where a source is communicating to a destination through a relay. During a given communication session time, the source collects measurement updates from a physical phenomenon and sends them to the relay, which then forwards them to the destination. The objective is to send these updates to the destination as timely as possible; namely, such that the total age of information is minimized by the end of the communication session, subject to energy causality constraints at the source and the relay, and data causality constraints at the relay. Both the source and the relay use fixed, yet possibly different, transmission rates. Hence, each update packet incurs fixed non-zero transmission delays.

In this talk, I will first solve the single-hop version of this problem, and then show that the two-hop problem is solved by treating the source and relay nodes as one combined node, with some parameter transformations, and solving a single-hop problem between that combined node and the destination.

Joint work with Prof. Sennur Ulukus.

10/05 Yuval Cassuto (Technion), Coding with Histograms for Flash-Memory Errors

ABSTRACT: The most dominant errors in q-ary Flash memories are asymmetric low-magnitude errors. Known coding schemes for these errors are effective when the number of such errors is either relatively small or very large. For error rates between these two extremes, our proposed code enforces a simple constraint: no two consecutive alphabet symbols (levels) are used in the same code block. We call this code NCC (non-consecutive constraint). The rate/correction tradeoff of the NCC is controlled simply by varying the block length.

It turns out that the NCC:
1) Improves over known codes of the same rate for moderate error rates.
2) Has efficient encoding and decoding using vector *histograms*.
3) Opens interesting combinatorial analysis of worst-case error correction (we show two such bounds).
4) Has interesting links to results in finite block length information theory.

Based on joint work with Evyatar Hemo.

10/19 Yury Polyanskiy (MIT), Sample Complexity of Population Recovery

ABSTRACT: In this talk we will first consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator.

Next, we apply this general framework to the specific problem of population recovery. Namely, consider a random poll of sample size $n$ conducted on a population of individuals, where each pollee is asked to answer $d$ binary questions. We consider one of the two polling impediments:
\item in lossy population recovery, a pollee may skip each question with probability $\epsilon$;
\item in noisy population recovery, a pollee may lie on each question with probability $\epsilon$.
Given $n$ lossy or noisy samples, the goal is to estimate the probabilities of all $2^d$ binary vectors simultaneously within accuracy $\delta$ with high probability. We settle the sample complexity for both models and discover curious phase-transition in the lossy model at $\epsilon=1/2$. Our results hinge on complex-analytic methods, such as Hadamard's three-lines theorem.

Joint work with Y. Wu (Yale) and A.T. Suresh (Google).

11/09 Zitan Chen (UMD), MDS Code Constructions with Small Sub-packetization and Near-optimal Repair Bandwidth

ABSTRACT: The constructions of exact-repairable MDS codes with optimal repair-bandwidth require working with large sub-packetization levels, which restricts their employment in practice. This talk presents constructions for MDS codes that simultaneously provide both small repair bandwidth and small sub-packetization level.

This talk is based on ``MDS Code Constructions with Small Sub-packetization and Near-optimal Repair Bandwidth,'' by Ankit Singh Rawat, Itzhak Tamo, Venkatesan Guruswami, and Klim Efremenko.

12/07 Sina Miran (UMD), Real-Time Tracking of Selective Auditory Attention from M/EEG: A Bayesian Filtering Approach

ABSTRACT: Humans are able to identify and track a target speaker amid a cacophony of acoustic interference, which is often referred to as the cocktail party phenomenon. Results from several decades of studying this phenomenon have culminated in recent years in various promising attempts to decode the attentional state of a listener in a competing-speaker environment from non-invasive neuroimaging recordings such as magnetoencephalography (MEG) and electroencephalography (EEG). To this end, most existing approaches compute correlation-based measures by either regressing the features of each speech stream to the M/EEG channels (the decoding approach) or vice versa (the encoding approach). These procedures operate in an offline fashion, i.e., require the entire duration of the experiment and multiple trials to provide robust results. Therefore, they cannot be used in emerging applications such as smart hearing aid devices, where a single trial must be used in real-time to decode the attentional state.

In this talk, we will develop an algorithmic pipeline for real-time decoding of the attentional state. Our proposed framework consists of three main modules: 1) Real-time and robust estimation of encoding or decoding coefficients, achieved by sparse adaptive filtering, 2) Extracting reliable markers of the attentional state, and thereby generalizing the widely-used correlation-based measures thereof, and 3) Devising a near real-time state-space estimator that translates the noisy and variable attention markers to robust and reliable estimates of the attentional state with minimal delay. Our proposed algorithms integrate various techniques including forgetting factor-based adaptive filtering, L1-regularization, forward-backward splitting algorithms, fixed-lag smoothing, and expectation maximization. We validate the performance of our proposed framework using comprehensive simulations as well as application to experimentally acquired M/EEG data. Our results reveal that the proposed real-time algorithms perform nearly as accurate as the existing state-of-the-art offline techniques, while providing a significant degree of adaptivity, statistical robustness, and computational savings.