증명 1 :
lna≤a−1a>0
−DKL(p||q)≤0DKL(p||q)≥0
−D(p||q)=−∑xp(x)lnp(x)q(x)=∑xp(x)lnq(x)p(x)≤(a)∑xp(x)(q(x)p(x)−1)=∑xq(x)−∑xp(x)=1−1=0
For inequality (a) we used the ln inequality explained in the beginning.
Alternatively you can start with Gibbs' inequality which states:
−∑xp(x)log2p(x)≤−∑xp(x)log2q(x)
Then if we bring the left term to the right we get:
∑xp(x)log2p(x)−∑xp(x)log2q(x)≥0∑xp(x)log2p(x)q(x)≥0
The reason I am not including this as a separate proof is because if you were to ask me to prove Gibbs' inequality, I would have to start from the non-negativity of KL divergence and do the same proof from the top.
Proof 2:
We use the Log sum inequality:
∑i=1nailog2aibi≥(∑i=1nai)log2∑ni=1ai∑ni=1bi
Then we can show that DKL(p||q)≥0:
D(p||q)=∑xp(x)log2p(x)q(x)≥(b)(∑xp(x))log2∑xp(x)∑xq(x)=1⋅log211=0
where we have used the Log sum inequality at (b).
Proof 3:
(Taken from the book "Elements of Information Theory" by Thomas M. Cover and Joy A. Thomas)
−D(p||q)=−∑xp(x)log2p(x)q(x)=∑xp(x)log2q(x)p(x)≤(c)log2∑xp(x)q(x)p(x)=log21=0
where at (c) we have used Jensen's inequality and the fact that log is a concave function.