A Story of Basis and Kernel - Part I: Function Basis

1. Review of Basis Concept

We know that everything in the world can be decomposed into the combination of the basic elements. For example, water is the combination of hydrogen and oxygen. Similarly, in mathematics, basis is used to represent various things in a simple and unified way.

In Rn\mathcal{R}^n space, we can use nn independent vectors to represent any vector by linear combination. The nn independent vectors can be viewed as a set of basis. There are infinite basis sets in Rn\mathcal{R}^n space. Among them, basis vectors that are orthogonal to each other are of special interests. For example, {ei}i=1n\{\mathbf{e}_i\}_{i=1}^n is a special basis set with mutually orthogonal basis vectors in the same length, where ei\mathbf{e}_i is a vector that has all zero entries except the iith entry which equals 1.

The inner product operator measures the similarity between vectors. For two vectors x\mathbf{x} and y\mathbf{y} , the inner product is the projection of one vector to the other.

<x,y>=xycosθ<\mathbf{x},\mathbf{y}>=|\mathbf{x}||\mathbf{y}|\cos\theta

If x=(x1,,xn)\mathbf{x}=(x_1,\cdots,x_n) and y=(y1,,yn)\mathbf{y}=(y_1, \cdots, y_n), we can get

<x,y>=i=1nxiyi<\mathbf{x},\mathbf{y}>=\sum_{i=1}^n x_i y_i

Until now it is a review of vector basis. These knowledge can also be extended to functions and function space.

2. Function Basis

A function is an infinite vector. As the following figure shows

source: http://en.wikipedia.org/wiki/Sampling_(signal_processing)

For a function defined on the interval [a,b][a,b], we take samples by an interval Δx\Delta x. If we sample the function f(x)f(x) at points a,x1,,xn,ba, x_1,\cdots,x_n,b, then we can transform the function into a vector (f(a),f(x1),,f(xn),f(b))T(f(a),f(x_1),\cdots,f(x_n),f(b))^T. When Δx0\Delta x\rightarrow 0, the vector should be more and more close to the function and at last, it becomes infinite.

The above analysis assumes xx to be a real number. But when xx is a vector, it still holds. In this article, we use bold font such as x\mathbf{x} to denote a vector in Rn\mathcal{R}^n space; use ff to denote the function itself, namely the infinite vector; use f(x)f(\mathbf{x}) to denote the evaluation of the function at point x\mathbf{x}. And the evaluation of a function should be a real number.

Since functions are so close to vectors, we can also define the inner product of functions similarly. For two functions ff and gg sampling by interval Δx\Delta x, the inner product may be defined as

<f,g>=limΔx0if(xi)g(xi)Δx=f(x)g(x)dx<f,g>=\lim_{\Delta x\rightarrow 0}\sum_{i} f(x_i) g(x_i)\Delta x=\int f(x)g(x)dx

For a vector, the dimension is discrete. We only have the first, second,… dimension. But we don't have the 0.5, 1.5,… dimension. However, the dimension is not discrete for functions, but continuous. Thus we use the difference between adjacent dimensions (i.e., Δx\Delta x ) for normalization.

The expression of function inner product is seen everywhere. It has various meanings in various context. For example, if XX is a continuous random variable with probability density function f(x)f(x) , i.e., f(x)>0f(x)>0 and f(x)dx=1\int f(x) dx = 1 , then the expectation

E[g(x)]=f(x)g(x)dx=<f,g>E[g(x)]=\int f(x) g(x) dx=<f,g>

Similar to vector basis, we can use a set of functions to represent other functions. The difference is that in a vector space, we only need finite vectors to construct a complete basis set, but in function space, we may need infinite basis functions. Two functions can be regarded as orthogonal if their inner product is zero. In function space, we can also have a set of function basis that are mutually orthogonal.

3. Example: Fourier Series

Let the basis functions {hp}p=+,(p\{h_p\}_{p=-\infty}^{+\infty}, (p is integer) be

hp(x)=ei2πpx/Th_p(x)=e^{i2\pi p x/T}

defined on interval [0,T][0, T]. Here ii is the imaginary number. These functions construct a function space and any function defined on interval [0,T][0, T] can be represented as linear combination of the basis functions. We can prove that any two basis functions are orthogonal (for complex numbers, the latter term should take conjugate transpose when calculating the inner product).

<hp,hq>=0Thp(x)hˉq(x)dx=0Tei2πpx/Tei2πqx/Tdx=0<h_p, h_q>=\int_0^T h_p (x) \bar{h}_q(x) dx=\int_0^T e^{i2\pi p x/T} e^{-i2\pi q x/T} dx=0

where pqp \neq q and a+bi=abi\overline{a+bi}=a-bi. The "length" of basis is

hp2=<hp,hp>=T|h_p|^2=<h_p,h_p>=T

If a function ff defined on interval [0,T][0,T] is within the space, it can be written as

f(x)=pcphp(x)=pcpei2πpx/Tf(x)=\sum_p c_p h_p(x)=\sum_p c_p e^{i2\pi px/T}

Since

<f,hp>=<qcqhq,hp>=qcq<hq,hp>=cp<hp,hp>=cpT<f,h_p>=<\sum_q c_q h_q,h_p>=\sum_q c_q < h_q,h_p>=c_p < h_p,h_p>=c_p T

the coefficient can be calculated as

cp=1T<f,hp>=1Tf(x)hˉp(x)dx=1T0Tf(x)ei2πpxdxc_p=\frac{1}{T} <f,h_p>=\frac{1}{T} \int f(x) \bar{h}_p(x) dx=\frac{1}{T} \int_0^T f(x) e^{-i2\pi p x} dx

which is the Fourier series.

4. Example: Wavelet Analysis

Define function ψ(x)\psi(x) on R\mathbf{R} as

ψ(x)={10x<12,112x<10otherwise\psi(x)= \begin{cases} 1 \quad & 0 \leq x < \frac{1}{2},\\ -1 & \frac{1}{2} \leq x < 1\\ 0 & \text{otherwise} \end{cases}

Define a series of functions for every pair n,kn,k of intergers

ψn,k(x)=2n/2ψ(2nxk),xR\psi_{n,k}(x) = 2^{n / 2} \psi(2^n x-k), \quad x \in \mathbf{R}

That is, we rescaled ψ(x)\psi(x) by 2n2^n and shift by 2nk2^{-n} k . We can show that

<ψn1,k1,ψn2,k2>=ψn1,k1(x),ψn2,k2(x)dx={1n1=n2,k1=k20otherwise<\psi_{n_1,k_1},\psi_{n_2,k_2} > = \int \psi_{n_1,k_1} (x),\psi_{n_2,k_2} (x) dx = \begin{cases} 1 \quad & n_1=n_2, k_1=k_2 \\ 0 \quad & \text{otherwise} \end{cases}

Any function can be represented as the linear combination of {ψn,k}n,k\{ \psi_{n,k} \}_{n,k} . By the same technique as the former example, we can also obtain the analytical expression of the coefficient for each basis function, which is the Haar wavelet analysis.

In the next part, fundermentals about kernel functions and kernel method will be discussed.