UNIT : IINFORMATION THEORY, CODING & CRYPTOGRAPHY (MCSE 202) PREPARED BY ARUN PRATAP SINGH 5/26/14 MTECH – 2nd SEMESTER PREPARED BY ARUN PRATAP SINGH 1 1 INTRODUCTION TO INFORMATION THEORY : Information theory is a branch of science that deals with the analysis of a communications system We will study digital communications – using a file (or network protocol) as the channel Claude Shannon Published a landmark paper in 1948 that was the beginning of the branch of information theory We are interested in communicating information from a source to a destination In our case, the messages will be a sequence of binary digits – Does anyone know the term for a binary digit? One detail that makes communicating difficult is noise – noise introduces uncertainty Suppose I wish to transmit one bit of information what are all of the possibilities? – tx 0, rx 0 - good – tx 0, rx 1 - error – tx 1, rx 0 - error – tx 1, rx 1 - good Two of the cases above have errors – this is where probability fits into the picture In the case of steganography, the “noise” may be due to attacks on the hiding algorithm UNIT : I PREPARED BY ARUN PRATAP SINGH 2 2 INFORMATION MEASURES : Any information source, analog or digital, produces an output that is random in nature. If it were not random, i.e., the output were known exactly, there would be no need to transmit it. We live in an analog world and most sources are analog sources, for example , speech, temperature fluctuations etc. The discrete sources are man-made sources, for example, a source (say, a man) that generates a sequence of letters from a finite alphabet (typing his email). Before we go on to develop a mathematical measure of information , let us develop an intuitive feel for it. Read the following sentences : PREPARED BY ARUN PRATAP SINGH 3 3 (A) Tomorrow, the sun will rise from the East (B) The phone will ring in the next one hour (C) It will snow in Delhi this winter The three sentences carry different amounts of information. In fact, the first sentence hardly carries any information. Everybody knows that the sun rises in the East and the probability of this happening again is almost unity. Sentence (B) appears to carry more information than sentence (A). The phone may ring, or it may not. There is a finite probability that the phone will ring in the next one . The last sentence probably made you read it over twice . This is because it has never snowed in Delhi, and the probability of snowfall is very low. It is interesting to note that the amount of information carried by the sentences listed above have something to do with the probability of occurrence of the events stated in the sentences. And we observe an inverse relationship. Sentence (A) which talks about an event which has a parobability of occurrence very close to 1 carries almost no information. Sentence (C), which has a very low probability of occurrence , appears to carry a lot of information. The other interesting thing to note is that the length of the sentence has nothing to do with the amount of information it conveys. In fact, sentence (A) is longest but carries the minimum information. We will now develop a mathematical measure of information. PREPARED BY ARUN PRATAP SINGH 4 4 PREPARED BY ARUN PRATAP SINGH 5 5 PREPARED BY ARUN PRATAP SINGH 6 6 REVIEW PROBABILITY THEORY : We could choose one of several technical definitions for probability, but for our purposes it refers to an assessment of the likelihood of the various possible outcomes in an experiment or some other situation with a “random” outcome. Why Probability Theory? Information is exchanged in a computer network in a random way, and events that modify the behavior of links and nodes in the network are also random We need a way to reason in quantitative ways about the likelihood of events in a network, and to predict the behavior of network components. Example 1: Measure the time between two packet arrivals into the cable of a local area network. Determine how likely it is that the interarrival time between any two packets is less than T sec. A mathematical model used to quantify the likelihood of events taking place in an experiment in which events are random. It consists of: A sample space: The set of all possible outcomes of a random experiment. The set of events: Subsets of the sample space. The probability measure: Defined according to a probability law for all the events of the sample space. PREPARED BY ARUN PRATAP SINGH 7 7 PREPARED BY ARUN PRATAP SINGH 8 8 RANDOM VARIABLES : In probability and statistics, a random variable, aleatory variable or stochastic variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). A random variable's possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, as a result of incomplete information or imprecise measurements). They may also conceptually represent either the results of an "objectively" random process (such as rolling a die), or the "subjective" randomness that results from incomplete knowledge of a quantity. The meaning of the PREPARED BY ARUN PRATAP SINGH 9 9 probabilities assigned to the potential values of a random variable is not part of probability theory itself, but instead related to philosophical arguments over the interpretation of probability. The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function, characteristic of a probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of a probability distribution; or a mixture of both types. The realizations of a random variable, that is, the results of randomly choosing values according to the variable's probability distribution function, are called random variates. Example : The possible outcomes for one coin toss can be described by the sample space . We can introduce a real-valued random variable that models a $1 payoff for a successful bet on heads as follows: If the coin is equally likely to land on either side then Y has a probability mass function given by: PREPARED BY ARUN PRATAP SINGH 10 10 PREPARED BY ARUN PRATAP SINGH 11 11 PREPARED BY ARUN PRATAP SINGH 12 12 PREPARED BY ARUN PRATAP SINGH 13 13 PREPARED BY ARUN PRATAP SINGH 14 14 RANDOM PROCESS : In probability theory, a stochastic process or sometimes random process (widely used) is a collection of random variables; this is often used to represent the evolution of some random value, or system, over time. This is the probabilistic counterpart to a deterministic process (or deterministic system). Instead of describing a process which can only evolve in one way (as in the case, for example, of solutions of an ordinary differential equation), in a stochastic or random process there is some indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely many) directions in which the process may evolve. In the simple case of discrete time, as opposed to continuous time, a stochastic process involves a sequence of random variables and the time series associated with these random variables (for example, see Markov chain, also known as discrete-time Markov chain). Another basic type of a stochastic process is a random field, whose domain is a region of space, in other words, a random function whose arguments are drawn from a range of continuously changing values. One approach to stochastic processes treats them as functions of one or several deterministic arguments (inputs, in most cases regarded as time) whose values (outputs) are random variables: non-deterministic (single) quantities which have certain probability distributions. Random variables corresponding to various times (or points, in the case of random fields) may be completely different. The main requirement is that these different random quantities all have the same type. Type refers to the codomain of the function. Although the random values of a stochastic process at different times may be independent random variables, in most commonly considered situations they exhibit complicated statistical correlations. PREPARED BY ARUN PRATAP SINGH 15 15 PREPARED BY ARUN PRATAP SINGH 16 16 PREPARED BY ARUN PRATAP SINGH 17 17 PREPARED BY ARUN PRATAP SINGH 18 18 MUTUAL INFORMATION : In probability theory and information theory, the mutual information or (formerly) transinformation of two random variables is a measure of the variables' mutual dependence. The most common unit of measurement of mutual information is the bit. Formally, the mutual information of two discrete random variables X and Y can be defined as: where p(x,y) is the joint probability distribution function of X and Y, and and are the marginal probability distribution functions of X and Y respectively. In the case of continuous random variables, the summation is replaced by a definite double integral: PREPARED BY ARUN PRATAP SINGH 19 19 where p(x,y) is now the joint probability density function of X and Y, and and are the marginal probability density functions of X and Y respectively. If the log base 2 is used, the units of mutual information are the bit. Intuitively, mutual information measures the information that X and Y share: it measures how much knowing one of these variables reduces uncertainty about the other. For example, if X and Y are independent, then knowing X does not give any information about Y and vice versa, so their mutual information is zero. At the other extreme, if X is a deterministic function of Y and Y is a deterministic function of X then all information conveyed by X is shared with Y: knowing X determines the value of Y and vice versa. As a result, in this case the mutual information is the same as the uncertainty contained in Y (or X) alone, namely the entropy of Y (or X). Moreover, this mutual information is the same as the entropy of X and as the entropy of Y. (A very special case of this is when X and Y are the same random variable.) Mutual information is a measure of the inherent dependence expressed in the joint distribution of X and Y relative to the joint distribution of X and Y under the assumption of independence. Mutual information therefore measures dependence in the following sense: I(X; Y) = 0 if and only if X and Y are independent random variables. This is easy to see in one direction: if X and Y are independent, then p(x,y) = p(x) p(y), and therefore: Moreover, mutual information is nonnegative (i.e. I(X;Y) ≥ 0; see below) and symmetric (i.e. I(X;Y) = I(Y;X)). OR PREPARED BY ARUN PRATAP SINGH 20 20 PREPARED BY ARUN PRATAP SINGH 21 21 PREPARED BY ARUN PRATAP SINGH 22 22 PREPARED BY ARUN PRATAP SINGH 23 23 ENTROPY : In information theory, entropy is a measure of the uncertainty in a random variable. [1] In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. [2] Entropy is typically measured in bits, nats, or bans. [3] Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content. Shannon entropy provides an absolute limit on the best PREPARED BY ARUN PRATAP SINGH 24 24 possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identically distributed random variables. A single toss of a fair coin has an entropy of one bit. A series of two fair coin tosses has an entropy of two bits. The number of fair coin tosses is its entropy in bits. This random selection between two outcomes in a sequence over time, whether the outcomes are equally probable or not, is often referred to as a Bernoulli process. The entropy of such a process is given by the binary entropy function. The entropy rate for a fair coin toss is one bit per toss. However, if the coin is not fair, then the uncertainty, and hence the entropy rate, is lower. This is because, if asked to predict the next outcome, we could choose the most frequent result and be right more often than wrong. The difference between what we know, or predict, and the information that the unfair coin toss reveals to us is less than one heads-or- tails "message", or bit, per toss. PREPARED BY ARUN PRATAP SINGH 25 25 SHANNON’S THEOREM : Shannon's theorem, proved by Claude Shannon in 1948, describes the maximum possible efficiency of error correcting methods versus levels of noise interference and data corruption. The theory doesn't describe how to construct the error-correcting method, it only tells us how good the best possible method can be. Shannon's theorem has wide-ranging applications in both communications and data storage applications. Considering all possible multi-level and multi-phase encoding techniques, the Shannon–Hartley theorem states the channel capacity C, meaning the theoretical tightest upper bound on the information rate (excluding error correcting codes) of clean (or arbitrarily low bit error rate) data that can be sent with a given average signal power S through an analog communication channel subject to additive white Gaussian noise of power N, is: Where C is the channel capacity in bits per second; PREPARED BY ARUN PRATAP SINGH 26 26 B is the bandwidth of the channel in hertz (passband bandwidth in case of a modulated signal); S is the average received signal power over the bandwidth (in case of a modulated signal, often denoted C, i.e. modulated carrier), measured in watts (or volts squared); N is the average noise or interference power over the bandwidth, measured in watts (or volts squared); and S/N is the signal-to-noise ratio (SNR) or the carrier-to-noise ratio (CNR) of the communication signal to the Gaussian noise interference expressed as a linear power ratio (not as logarithmic decibels). Example : If the SNR is 20 dB, and the bandwidth available is 4 kHz, which is appropriate for telephone communications, then C = 4 log2(1 + 100) = 4 log2 (101) = 26.63 kbit/s. Note that the value of 100 is appropriate for an SNR of 20 dB. If it is required to transmit at 50 kbit/s, and a bandwidth of 1 MHz is used, then the minimum SNR required is given by 50 = 1000 log2(1+S/N) so S/N = 2C/W -1 = 0.035 corresponding to an SNR of -14.5 dB. This shows that it is possible to transmit using signals which are actually much weaker than the background noise level. Shannon's law is any statement defining the theoretical maximum rate at which error free digits can be transmitted over a bandwidth limited channel in the presence of noise. Shannon theorem puts a limit on transmission data rate, not on error probability: Theoretically possible to transmit information at any rate Rb , where Rb C with an arbitrary small error probability by using a sufficiently complicated coding scheme. For an information rate Rb > C , it is not possible to find a code that can achieve an arbitrary small error probability. PREPARED BY ARUN PRATAP SINGH 27 27 PREPARED BY ARUN PRATAP SINGH 28 28 Noisy channel coding theorem and capacity : Claude Shannon's development of information theory during World War II provided the next big step in understanding how much information could be reliably communicated through noisy channels. Building on Hartley's foundation, Shannon's noisy channel coding theorem (1948) describes the maximum possible efficiency of error-correcting methods versus levels of noise interference and data corruption. [5][6] The proof of the theorem shows that a randomly constructed error-correcting code is essentially as good as the best possible code; the theorem is proved through the statistics of such random codes. Shannon's theorem shows how to compute a channel capacity from a statistical description of a channel, and establishes that given a noisy channel with capacity C and information transmitted at a line rate R, then if there exists a coding technique which allows the probability of error at the receiver to be made arbitrarily small. This means that theoretically, it is possible to transmit information nearly without error up to nearly a limit of C bits per second. The converse is also important. If the probability of error at the receiver increases without bound as the rate is increased. So no useful information can be transmitted beyond the channel capacity. The theorem does not address the rare situation in which rate and capacity are equal. PREPARED BY ARUN PRATAP SINGH 29 29 The Shannon–Hartley theorem establishes what that channel capacity is for a finite- bandwidth continuous-time channel subject to Gaussian noise. It connects Hartley's result with Shannon's channel capacity theorem in a form that is equivalent to specifying the M in Hartley's line rate formula in terms of a signal-to-noise ratio, but achieving reliability through error-correction coding rather than through reliably distinguishable pulse levels. If there were such a thing as a noise-free analog channel, one could transmit unlimited amounts of error-free data over it per unit of time (Note: An infinite-bandwidth analog channel can't transmit unlimited amounts of error-free data, without infinite signal power). Real channels, however, are subject to limitations imposed by both finite bandwidth and nonzero noise. So how do bandwidth and noise affect the rate at which information can be transmitted over an analog channel? Surprisingly, bandwidth limitations alone do not impose a cap on maximum information rate. This is because it is still possible for the signal to take on an indefinitely large number of different voltage levels on each symbol pulse, with each slightly different level being assigned a different meaning or bit sequence. If we combine both noise and bandwidth limitations, however, we do find there is a limit to the amount of information that can be transferred by a signal of a bounded power, even when clever multi-level encoding techniques are used. In the channel considered by the Shannon–Hartley theorem, noise and signal are combined by addition. That is, the receiver measures a signal that is equal to the sum of the signal encoding the desired information and a continuous random variable that represents the noise. This addition creates uncertainty as to the original signal's value. If the receiver has some information about the random process that generates the noise, one can in principle recover the information in the original signal by considering all possible states of the noise process. In the case of the Shannon– Hartley theorem, the noise is assumed to be generated by a Gaussian process with a known variance. Since the variance of a Gaussian process is equivalent to its power, it is conventional to call this variance the noise power. Such a channel is called the Additive White Gaussian Noise channel, because Gaussian noise is added to the signal; "white" means equal amounts of noise at all frequencies within the channel bandwidth. Such noise can arise both from random sources of energy and also from coding and measurement error at the sender and receiver respectively. Since sums of independent Gaussian random variables are themselves Gaussian random variables, this conveniently simplifies analysis, if one assumes that such error sources are also Gaussian and independent. Implications of the theorem – Comparison of Shannon's capacity to Hartley's law - Comparing the channel capacity to the information rate from Hartley's law, we can find the effective number of distinguishable levels M: PREPARED BY ARUN PRATAP SINGH 30 30 The square root effectively converts the power ratio back to a voltage ratio, so the number of levels is approximately proportional to the ratio of rms signal amplitude to noise standard deviation. This similarity in form between Shannon's capacity and Hartley's law should not be interpreted to mean that M pulse levels can be literally sent without any confusion; more levels are needed, to allow for redundant coding and error correction, but the net data rate that can be approached with coding is equivalent to using that M in Hartley's law. Alternative forms – Frequency-dependent (colored noise) case- In the simple version above, the signal and noise are fully uncorrelated, in which case S + N is the total power of the received signal and noise together. A generalization of the above equation for the case where the additive noise is not white (or that the S/N is not constant with frequency over the bandwidth) is obtained by treating the channel as many narrow, independent Gaussian channels in parallel: where C is the channel capacity in bits per second; B is the bandwidth of the channel in Hz; S(f) is the signal power spectrum N(f) is the noise power spectrum f is frequency in Hz. Note: the theorem only applies to Gaussian stationary process noise. This formula's way of introducing frequency-dependent noise cannot describe all continuous-time noise processes. For example, consider a noise process consisting of adding a random wave whose amplitude is 1 or -1 at any point in time, and a channel that adds such a wave to the source signal. Such a wave's frequency components are highly dependent. Though such a noise may have a high power, it is fairly easy to transmit a continuous signal with much less power than one would need if the underlying noise was a sum of independent noises in each frequency band. PREPARED BY ARUN PRATAP SINGH 31 31 Approximations - For large or small and constant signal-to-noise ratios, the capacity formula can be approximated: If S/N >> 1, then where Similarly, if S/N << 1, then In this low-SNR approximation, capacity is independent of bandwidth if the noise is white, of spectral density watts per hertz, in which case the total noise power is . REDUNDANCY : Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while checksums are a way of adding desired redundancy for purposes of error detection when communicating over a noisy channel of limited capacity. In describing the redundancy of raw data, the rate of a source of information is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is the limit, as n goes to infinity, of the joint entropy of the first n symbols divided by n. It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memoryless source is simply , since by definition there is no interdependence of the successive messages of a memory less source. The absolute rate of a language or source is simply PREPARED BY ARUN PRATAP SINGH 32 32 the logarithm of the cardinality of the message space, or alphabet. (This formula is sometimes called the Hartley function.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memory less and has a uniform distribution. The absolute redundancy can then be defined as the difference between the absolute rate and the rate. The quantity is called the relative redundancy and gives the maximum possible data compression ratio, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is efficiency, defined as so that . A memory less source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed. A measure of redundancy between two variables is the mutual information or a normalized variant. A measure of redundancy among many variables is given by the total correlation. Redundancy of compressed data refers to the difference between the expected compressed data length of messages (or expected data rate ) and the entropy (or entropy rate ). (Here we assume the data is ergodic and stationary, e.g., a memoryless source.) Although the rate difference can be arbitrarily small as increased, the actual difference , cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources. HUFFMAN CODING : In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". PREPARED BY ARUN PRATAP SINGH 33 33 Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. The running time of Huffman's method is fairly efficient, it takes operations to construct it. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted. We will now study an algorithm for constructing efficient source codes for a DMS with source symbols that are not equally probable. A variable length encoding algorithm was suggested by Huffman in 1952, based on the source symbol probabilities P(xi ) i=1,2,….,L. The algorithm is optimal in the sense that the average number of bits it requires to represent the source symbols is a minimum, and also meets the prefix condition. The steps of Huffman coding algorithm are given below : PREPARED BY ARUN PRATAP SINGH 34 34 PREPARED BY ARUN PRATAP SINGH 35 35 PREPARED BY ARUN PRATAP SINGH 36 36 PREPARED BY ARUN PRATAP SINGH 37 37 PREPARED BY ARUN PRATAP SINGH 38 38 PREPARED BY ARUN PRATAP SINGH 39 39 PREPARED BY ARUN PRATAP SINGH 40 40 RANDOM VARIABLES : A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous. DISCRETE RANDOM VARIABLES : A discrete random variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,........ Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten. The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function. (Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) Suppose a random variable X may take k different values, with the probability that X = xi defined to be P(X = xi) = pi. The probabilities pi must satisfy the following: 1: 0 < pi < 1 for each i 2: p1 + p2 + ... + pk = 1. Example : Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities associated with each outcome are described by the following table: Outcome 1 2 3 4 PREPARED BY ARUN PRATAP SINGH 41 41 Probability 0.1 0.3 0.4 0.2 The probability that X is equal to 2 or 3 is the sum of the two probabilities: P(X = 2 or X = 3) = P(X = 2) + P(X = 3) = 0.3 + 0.4 = 0.7. Similarly, the probability that X is greater than 1 is equal to 1 - P(X = 1) = 1 - 0.1 = 0.9, by the complement rule. This distribution may also be described by the probability histogram shown: CONTINUOUS RANDOM VARIABLES : A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) A continuous random variable is not defined at specific values. Instead, it is defined over an interval of values, and is represented by the area under a curve (in advanced mathematics, this is known as an integral). The probability of observing any single value is equal to 0, since the number of values which may be assumed by the random variable is infinite. Suppose a random variable X may take all values over an interval of real numbers. Then the probability that X is in the set of outcomes A, P(A), is defined to be the area above A and under a curve. The curve, which represents a function p(x), must satisfy the following: 1: The curve has no negative values (p(x) > 0 for all x) 2: The total area under the curve is equal to 1. A curve meeting these requirements is known as a density curve. PREPARED BY ARUN PRATAP SINGH 42 42 A gaussian random variable is completely determined by its mean and variance. • The function that is frequently used for the area under the tail of the gaussian pdf (Probability Distribution Function) is the denoted by Q(x). PREPARED BY ARUN PRATAP SINGH 43 43 • The Q-function is a standard form for expressing error probabilities without a closed form BOUNDS ON TAIL PROBABILITY : General bounds on tail probability of a random variable (that is, probability that a random variable deviates far from its expectation) In probability theory, the Chernoff bound, named after Herman Chernoff, gives exponentially decreasing bounds on tail distributions of sums of independent random variables. It is a sharper bound than the PREPARED BY ARUN PRATAP SINGH 44 44 known first or second moment based tail bounds such as Markov's inequality or Chebyshev inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires that the variates be independent – a condition that neither the Markov nor the Chebyshev inequalities require. PREPARED BY ARUN PRATAP SINGH 45 45
Report "Information Theory, Coding and Cryptography Unit-1 by Arun Pratap Singh"