## Final Report Summary - MDITWACM (Mismatched Decoding in Information Theory with Applications to Channel Modelling)

Over the past decades, and through the sister concepts of entropy and channel capacity, which respectively characterize the best solutions to the dual problems of source and channel coding, information and communication theory have guided engineers and computer scientists in the design and implementation of ever more efficient communication systems. Notwithstanding this success, the recent trend to communicate over short temporal durations has undermined two underlying assumptions in the prevailing theoretical analysis. First, perfect knowledge of the stochastic nature of the channel may be difficult to acquire, a fact which renders it impossible the use of optimum decoding rules and makes it necessary to apply invoke a mismatched decoding perspective. Secondly, efficient mathematical tools valid for very long transmission durations, such as the asymptotic analysis implicit in large-deviation theory, are of difficult justification, and ought to be replaced by new methods valid for arbitrary transmission lengths. In the past four years, my research has focused on developing new methods and tools that address these challenges in mismatched decoding at arbitrary, i.e. finite, transmission lengths. More specifically, and in contrast to the focus on achievable rates prevalent in most existing work on mismatched decoding, my work has focused on the analysis of the random-coding error probability as a benchmark of the best possible performance of channel codes in this context. The main outcomes of this fundamental research in the mathematical foundations of communication and information theory are the following:

1. We have generalized Gallager’s cost-constrained ensemble to include multiple auxiliary costs. This ensemble serves as an alternative to constant-composition codes to improve the performance of i.i.d. coding, and is applicable to general channels with infinite or continuous alphabets. Furthermore, we have found an ensemble-tight error exponent for the cost-constrained ensemble. Interestingly, the (best possible) exponent for the constant-composition ensemble can be recovered using at most two auxiliary costs for the single-user channel; for multi-user channels, the number of cost functions scales with the number of users. In the case of bit-interleaved coded modulation (BICM), we have found significant gains in exponent and rates for some configurations, especially at low and medium signal-to-noise ratios (SNR), and sometimes exceeding the so-called BICM capacity. Along this line, we have proved that natural binary labelling is both first- and second-order optimal at low SNR, the first such labelling with guaranteed optimality.

2. We have proposed a new simple method, fixed-energy renormalization, to reduce the error of both coded modulation and BICM over additive white Gaussian noise channels. The novelty consists of renormalizing the codewords of a given generic code to have a fixed energy before sending them over the channel. The decoder remains unmodified and therefore uses a mismatched maximum-metric decoding rule that ignores the energy renormalization at the encoder. We have characterized theoretically the performance by means of its random-coding error exponent. We have found that this renormalization technique approaches the random-coding error exponent of the constant-composition ensemble for a large number of pairs of rate and SNR. Besides, we have run simulation results that confirm that the improvement can be validated in practice by a sensible choice of the channel code, modulation scheme and decoder.

3. In parallel, we have found that the generalized Gallager’s cost-constrained ensemble is also relevant for joint source-channel coding and in the analysis of expurgated exponents. In joint source-channel coding, we have derived a new upper bound on the average error probability based on a construction for which source messages are assigned to disjoint subsets (classes), and codewords are independently generated according to a distribution that depends on the class of the source message. For discrete memoryless systems, two optimally chosen classes and product distributions are necessary and sufficient to attain the sphere-packing exponent in those cases where it is tight. As for the expurgated exponent, we have obtained a simple non-asymptotic bound that attains an exponent by Csiszár and Körner for discrete memoryless channels, while remaining valid for continuous alphabets. We have also merged the two lines of work and studied expurgated error exponents for almost lossless joint source-channel coding by Gallager's expurgation techniques and applying them to obtain non-asymptotic bounds that recover two exponents originally given by Csiszár.

4. Inspired by the previous research, we have proposed and studied an almost-lossless multi-class source-channel coding scheme in which source messages are assigned to different classes and encoded with a channel code that depends on the class index. We studied the code by means of random-coding error exponents and validated its performance by simulation of a low-complexity implementation using existing source and channel codes. While each class code can be seen as a concatenation of a source code and a channel code, the overall performance improves on that of separate source-channel coding and approaches that of joint source-channel coding when the number of classes increases. While this construction fails to achieve the best performance of joint source-channel coding, for a fixed number of classes it presents a reduced complexity. This scheme shows a practical way to improve on the error exponent of separate coding, and, as the number of classes increases, approach the error exponent of joint source-channel coding.

5. We have derived two alternative exact characterizations of the minimal error probability of Bayesian M-ary hypothesis testing, with applications to the derivation of converse results in mismatched decoding. The first expression corresponds the error probability in an induced binary hypothesis test and implies the tightness of the meta-converse bound by Polyanskiy, Poor and Verdú; the second expression implies the tightness of a generalized Verdú-Han lower bound. The expressions help to characterize the minimal error probability of several problems in information theory and to identify the steps where existing converse bounds are loose.

6. When applied to the mismatched discrete memoryless multiple-access channel, an extension of the bounds in 1) yield error exponents that are tight with respect to the ensemble average, and positive within the interior of Lapidoth's achievable rate region. In the setting of single-user mismatched decoding, we have applied similar analysis techniques to two types of superposition coding. The standard version is shown to yield an achievable rate at least as high as that of Lapidoth's expurgated MAC rate after the optimization of the parameters. We have also studied a multi-letter successive decoding rule depending and derived achievable rate regions and error exponents for both the standard MAC (independent codebooks) and the cognitive MAC (one user knows both messages). The rate regions are compared with those of a maximum-metric decoder, and numerical examples are given for which successive decoding yields a strictly higher sum rate for a given pair of input distributions. In related work, we showed that a refined version of superposition coding achieves rates at least as good as the standard version for any set of random-coding parameters, with a gap between the two can be significant for a fixed input distribution. In addition, we have looked at the achievable second-order coding rates for the cost-constrained ensemble and found that the performance of constant-composition coding can be matched with a fixed number of auxiliary costs. These techniques have been found to provide a simple method for obtaining previously known second-order achievability results for continuous and input-constrained channels

7. We have applied this superposition method to the mismatched decoding problem for binary-input discrete memoryless channels and provided an example for which an achievable rate based on superposition coding exceeds the LM rate, thus providing a counter-example to a previously reported converse result by Balakirsky in 1995. We established this claim by combining numerical evaluations with theoretical results. Significantly, Balakirsky’s theorem was the only non-trivial converse result in the theory of mismatched decoding.

8. A promising enhancement to large-deviation theory is the Laplace or saddlepoint approximation, which has shown its effectiveness in numerous applications in physics. One of the original goals of the project was to correct the little attention it has received by information theorists by showing how it naturally leads to significant improvements in common estimates of error probabilities at no additional computational cost. Along these lines, we have derived refined asymptotic results for i.i.d. random coding and expurgated random coding. For some of these non-asymptotic random-coding bounds, saddlepoint approximations can be computed efficiently and, as expected, they characterize the asymptotic behaviour of the corresponding bounds (in the limit of very large block lengths) at all positive rates. These approximations have turned out to be remarkably accurate even at small block lengths. From a different direction, we have derived a saddlepoint approximation for the random-coding bound to the error probability of channel coding by using complex-integration techniques. The approximation is given by a sum of two terms: one with Gallager's exponent, and a second one with Arimoto's strong converse exponent (above capacity) or the sphere-packing exponent (below the critical rate).

A detailed list which relates these results to the relevant publications can be found at the project web page http://www.dtic.upf.edu/~amartinez/MDITwACM-CIG/index.html

1. We have generalized Gallager’s cost-constrained ensemble to include multiple auxiliary costs. This ensemble serves as an alternative to constant-composition codes to improve the performance of i.i.d. coding, and is applicable to general channels with infinite or continuous alphabets. Furthermore, we have found an ensemble-tight error exponent for the cost-constrained ensemble. Interestingly, the (best possible) exponent for the constant-composition ensemble can be recovered using at most two auxiliary costs for the single-user channel; for multi-user channels, the number of cost functions scales with the number of users. In the case of bit-interleaved coded modulation (BICM), we have found significant gains in exponent and rates for some configurations, especially at low and medium signal-to-noise ratios (SNR), and sometimes exceeding the so-called BICM capacity. Along this line, we have proved that natural binary labelling is both first- and second-order optimal at low SNR, the first such labelling with guaranteed optimality.

2. We have proposed a new simple method, fixed-energy renormalization, to reduce the error of both coded modulation and BICM over additive white Gaussian noise channels. The novelty consists of renormalizing the codewords of a given generic code to have a fixed energy before sending them over the channel. The decoder remains unmodified and therefore uses a mismatched maximum-metric decoding rule that ignores the energy renormalization at the encoder. We have characterized theoretically the performance by means of its random-coding error exponent. We have found that this renormalization technique approaches the random-coding error exponent of the constant-composition ensemble for a large number of pairs of rate and SNR. Besides, we have run simulation results that confirm that the improvement can be validated in practice by a sensible choice of the channel code, modulation scheme and decoder.

3. In parallel, we have found that the generalized Gallager’s cost-constrained ensemble is also relevant for joint source-channel coding and in the analysis of expurgated exponents. In joint source-channel coding, we have derived a new upper bound on the average error probability based on a construction for which source messages are assigned to disjoint subsets (classes), and codewords are independently generated according to a distribution that depends on the class of the source message. For discrete memoryless systems, two optimally chosen classes and product distributions are necessary and sufficient to attain the sphere-packing exponent in those cases where it is tight. As for the expurgated exponent, we have obtained a simple non-asymptotic bound that attains an exponent by Csiszár and Körner for discrete memoryless channels, while remaining valid for continuous alphabets. We have also merged the two lines of work and studied expurgated error exponents for almost lossless joint source-channel coding by Gallager's expurgation techniques and applying them to obtain non-asymptotic bounds that recover two exponents originally given by Csiszár.

4. Inspired by the previous research, we have proposed and studied an almost-lossless multi-class source-channel coding scheme in which source messages are assigned to different classes and encoded with a channel code that depends on the class index. We studied the code by means of random-coding error exponents and validated its performance by simulation of a low-complexity implementation using existing source and channel codes. While each class code can be seen as a concatenation of a source code and a channel code, the overall performance improves on that of separate source-channel coding and approaches that of joint source-channel coding when the number of classes increases. While this construction fails to achieve the best performance of joint source-channel coding, for a fixed number of classes it presents a reduced complexity. This scheme shows a practical way to improve on the error exponent of separate coding, and, as the number of classes increases, approach the error exponent of joint source-channel coding.

5. We have derived two alternative exact characterizations of the minimal error probability of Bayesian M-ary hypothesis testing, with applications to the derivation of converse results in mismatched decoding. The first expression corresponds the error probability in an induced binary hypothesis test and implies the tightness of the meta-converse bound by Polyanskiy, Poor and Verdú; the second expression implies the tightness of a generalized Verdú-Han lower bound. The expressions help to characterize the minimal error probability of several problems in information theory and to identify the steps where existing converse bounds are loose.

6. When applied to the mismatched discrete memoryless multiple-access channel, an extension of the bounds in 1) yield error exponents that are tight with respect to the ensemble average, and positive within the interior of Lapidoth's achievable rate region. In the setting of single-user mismatched decoding, we have applied similar analysis techniques to two types of superposition coding. The standard version is shown to yield an achievable rate at least as high as that of Lapidoth's expurgated MAC rate after the optimization of the parameters. We have also studied a multi-letter successive decoding rule depending and derived achievable rate regions and error exponents for both the standard MAC (independent codebooks) and the cognitive MAC (one user knows both messages). The rate regions are compared with those of a maximum-metric decoder, and numerical examples are given for which successive decoding yields a strictly higher sum rate for a given pair of input distributions. In related work, we showed that a refined version of superposition coding achieves rates at least as good as the standard version for any set of random-coding parameters, with a gap between the two can be significant for a fixed input distribution. In addition, we have looked at the achievable second-order coding rates for the cost-constrained ensemble and found that the performance of constant-composition coding can be matched with a fixed number of auxiliary costs. These techniques have been found to provide a simple method for obtaining previously known second-order achievability results for continuous and input-constrained channels

7. We have applied this superposition method to the mismatched decoding problem for binary-input discrete memoryless channels and provided an example for which an achievable rate based on superposition coding exceeds the LM rate, thus providing a counter-example to a previously reported converse result by Balakirsky in 1995. We established this claim by combining numerical evaluations with theoretical results. Significantly, Balakirsky’s theorem was the only non-trivial converse result in the theory of mismatched decoding.

8. A promising enhancement to large-deviation theory is the Laplace or saddlepoint approximation, which has shown its effectiveness in numerous applications in physics. One of the original goals of the project was to correct the little attention it has received by information theorists by showing how it naturally leads to significant improvements in common estimates of error probabilities at no additional computational cost. Along these lines, we have derived refined asymptotic results for i.i.d. random coding and expurgated random coding. For some of these non-asymptotic random-coding bounds, saddlepoint approximations can be computed efficiently and, as expected, they characterize the asymptotic behaviour of the corresponding bounds (in the limit of very large block lengths) at all positive rates. These approximations have turned out to be remarkably accurate even at small block lengths. From a different direction, we have derived a saddlepoint approximation for the random-coding bound to the error probability of channel coding by using complex-integration techniques. The approximation is given by a sum of two terms: one with Gallager's exponent, and a second one with Arimoto's strong converse exponent (above capacity) or the sphere-packing exponent (below the critical rate).

A detailed list which relates these results to the relevant publications can be found at the project web page http://www.dtic.upf.edu/~amartinez/MDITwACM-CIG/index.html