Periodic Reporting for period 1 - TRANCIDS (Transmission over Channels with Insertions and Deletions)
Reporting period: 2022-10-01 to 2025-03-31
The reasons behind the possible indels are extensive. For instance, in a high-rate digital communication system, the frequencies of the transmitter and receiver oscillators cannot be perfectly synchronized in practice, resulting in a possible miss of a transmitted symbol (deletion) or a possible re-read of the same symbol (insertion). In particular, magnetic and optical recording channels represent an essential category of communication systems suffering from indels. This is true for both today's recording systems and future storage technologies, including DNA storage.
As insertions and deletions arise in more traditional communications systems and emerging DNA storage systems due to the nature of the DNA and the underlying biological mechanisms, standard communication systems, as well as both in vitro and in vivo DNA storage technologies, necessitate a thorough understanding and addressing of insertion and deletions in an effective manner.
The aims of this ERC-Advanced Grant are two-fold: 1) to calculate new and useful performance limits (by developing novel information-theoretic approaches) for different insertion/deletion channel models encountered in different applications, and 2) to develop practical signaling solutions to approach these bounds. The specific objectives of the proposed research are given under four general titles:
1) to determine the fundamental limits of insertion/deletion channels;
2) to formulate and explore wireless channels with insertions and deletions;
3) to develop practical signaling schemes for channels with insertions and deletions;
4) to address transmission over insertion/deletion channels with additional impairments (including permutations) as motivated by DNA storage applications.
1) We developed capacity bounds for the Poisson-Repeat channel, which is a model for synchronization errors developed previously in the literature. Our related paper (presented in IEEE ISIT 2023) showed that one can provide side information on the number of deletions in a given block and convert the Poisson-repeat channel to independent decoupled channels. Hence, it becomes possible to use numerical approaches to obtain the Poisson-repeat channel with side information, hence an upper bound on its capacity.
2) We developed a basic concatenated coding principle for DNA storage channels. Noisy shuffling channels are used as models for DNA storage channels. In such systems, the order in which the sequences are stored is lost - due to the nature of the DNA pool; hence, it becomes highly challenging to decode the stored information. Explicit indexing is a standard technique to alleviate the problems. In a paper we have presented in IEEE GLOBECOM 2023 (whose extended version has been accepted for publication in IEEE Trans. on Communications), we have proposed an effective alternative to explicit indexing. We showed that it is possible to use polar codes (or other block codes) with different cosets, with the specific cosets being used containing the order information. The proposed solution is highly practical and offers improved error probability performance compared to the other existing solutions.
3) We proved the information stability of insertion/deletion channels with memory, which is a model encountered in current DNA storage technologies. There is a classical result on the existence of Shannon capacity for memoryless synchronization error channels (under some mild conditions on the channel statistics). In our paper (presented in IEEE ISIT 2024, with the journal version being prepared), we proved that the information stability holds for the case with insertion/deletion processes possessing memory. Such errors with memory occur in practical DNA storage applications; hence, addressing the relevant channel models is critical. Our result establishes that Shannon capacity exists for these models as well. The methodology developed could be useful for generating information stability results for other communication schemes over non-traditional media. We also provide specific examples of deletion channels with Markov memory and numerically evaluate capacity bounds. The results allow us to quantify the capacity difference between memoryless deletion channels and those with memory with the same deletion probability.
4) We determined the asymptotic capacity of insertion channels (for different insertion models) for small insertion probabilities. We obtained the dominant terms of the channel capacity; hence, we can characterize the capacity of insertion channels (which are valuable models for various applications, including DNA storage) in the asymptotic regime. We expect the general methodology to be crucial in investigating other channel models with synchronization errors, including channels exhibiting both deletions and insertions and those exhibiting insertions/deletions/substitutions. The extended version of the paper has been completed and submitted for publication in IEEE Transactions on Information Theory.
5) We explored and developed deep-learning-based decoding algorithms for concatenated coding using marker codes as inner codes for insertion and deletion channels. We considered concatenated coding approaches (where the outer code can be a low-density parity-check code or a convolutional code, and the inner code is a marker code). Such coding solutions provide the state of the art of communicating over channels with synchronization errors. However, the decoding algorithms are highly complex, which hinders their practical use. Motivated by this, and with the premise that the proposed solutions could be promising for more complicated models, too, we developed deep-learning architectures for decoding of concatenated coding solutions over insertion/deletion channels.