From Vector to Function — Transformations, Basis, and Kernel Method

Here is only some superficial thought about Fourier transformation (and other similar transformations), space basis, and kernel method. I will continuously update the post for new thoughts.

1. What is function

A function is an infinite vector. Every thought begins here. As the following figure shows

source: p>For a function defined on the interval [a,b][a,b], we take samples by an interval Δx\Delta x. If we sample the function f(x)f(x) at points a,x1,,xn,ba, x_1,\cdots,x_n,b, then we can transform the function into a vector (f(a),f(x1),,f(xn),f(b))T(f(a),f(x_1),\cdots,f(x_n),f(b))^T. When Δx0\Delta x\rightarrow 0, the vector should be more and more close to the function and at last, it becomes infinite.

The above analysis assumes xx to be a real number. But when xx is a vector, it still holds. From now on, we use bold font such as x\mathbf{x} to denote normal vector in Rn\mathcal{R}^n space; use ff to denote the function itself, namely the infinite vector; use f(x)f(\mathbf{x}) to denote the evaluation of the function at point x\mathbf{x}. And the evaluation of a function should be a real number.

2. Function inner product

Vectors have inner product. For vector x=(x1,,xn)\mathbf{x}=(x_1,\cdots,x_n) and y=(y1,,yn)\mathbf{y}=(y_1, \cdots, y_n), the inner product of x\mathbf{x} and y\mathbf{y} is defined as

<x,y>=i=1nxiyi <\mathbf{x},\mathbf{y}>=\sum_{i=1}^n x_i y_i

Similarly, functions have inner product. For two functions latexflatex f and latexglatex g sampling by interval latexΔxlatex \Delta x, the inner product may be defined as

<f,g>=limΔx0if(xi)g(xi) <f,g>=\lim_{\Delta x\rightarrow 0}\sum_{i} f(x_i) g(x_i)

at first thought. However, the above equation doesn't converge when Δx0\Delta x\rightarrow 0. Something is missing. As xx approaches 0, the dimension of ff and gg grows larger in our denotation.

For a vector, the dimension is discrete. We only have the first, second,... dimension. But we don't have the 0.5, 1.5,... dimension. When we transform a function into a vector, every sample point is located at a discrete dimension. However, this should not be true, since the dimension of each sample point should be stable so that the above equation will converge. Therefore, the dimension is not discrete for functions, but continuous. What we miss is the difference between adjacent dimensions for normalization. Therefore, we can define the inner product of two functions as

<f,g>=limΔx0if(xi)g(xi)Δx=f(x)g(x)dx <f,g>=\lim_{\Delta x\rightarrow 0}\sum_{i} f(x_i) g(x_i)\Delta x=\int f(x)g(x)dx

3. Basis

What is the meaning of inner product? For two vectors AA and BB, the inner product is the projection of one vector to the other.

<A,B>=ABcosθ <A,B>=|A||B|\cos\theta


projection(A)=<A,B>B2B \mathrm{projection}(A) = \frac{<A,B>}{|B|^2} B

If we regard the length of BB as one unit, then the inner product is the coordinate of the projection on BB. Therefore, given a set of orthogonal basis vector {ei}i=1n\{\mathbf{e}_i\}_{i=1}^n where <ei,ej>=0<\mathbf{e}_i,\mathbf{e}_j>=0 and ei=1|\mathbf{e}_i|=1. The basis vectors can establish a space by linear operations. Any vector x\mathbf{x} in the space can be denoted as

x=i=1n<x,ei>ei \mathbf{x}=\sum_{i=1}^n <\mathbf{x},\mathbf{e}_i> \cdot \mathbf{e_i}

More generally, if ei1|\mathbf{e}_i| \neq 1, then

x=i=1n<x,ei>ei2ei=i=1n<x,ei><ei,ei>ei \mathbf{x}=\sum_{i=1}^n \frac{<\mathbf{x},\mathbf{e}_i>}{|\mathbf{e}_i|^2} \mathbf{e}_i=\sum_{i=1}^n \frac{<\mathbf{x},\mathbf{e}_i>}{<\mathbf{e}_i,\mathbf{e}_i>} \mathbf{e}_i

Functions also have basis, which can be orthogonal or non-orthogonal. If we define the set of function basis, any function in the space can also be decomposed into the combination of several basis functions. Function inner product can also be used to calculate the coefficient on any specific basis. However, since the dimension of function is continuous, there may be infinite basis functions.

For a set of function basis {hi}\{h_i\}, if <hi,hj>=0<h_i,h_j>=0 and hi=1|h_i|=1, any function within the space can be denoted as

f=<f,hi>hi=if(x)hi(x)dxhi f=\sum <f,h_i> \cdot h_i=\sum_i \int f(x)h_i(x)dx \cdot h_i

If hi1|h_i| \neq 1,

f=i<f,hi><hi,hi>hi f=\sum_i \frac{<f,h_i>}{<h_i,h_i>} h_i

similar to the vector case.

4. Example: Transformations

Let the basis functions {hp},(p\{h_p\}, (p is interger) be

hp(x)=ei2πpx/T h_p(x)=e^{i2\pi p x/T}

defined on interval [0,T][0, T]. Here I use pp to index the basis functions in order to distinguish from the imaginary number ii. These construct a space and any function defined on interval [0,T][0, T] is within the space. It can be proven that any two basis function are orthogonal (for complex numbers, the latter term should take conjugate transpose when calculating the inner product).

<hp,hq>=0Thp(x)hq(x)ˉdx=0Tei2πpx/Tei2πqx/Tdx=0 <h_p, h_q>=\int_0^T h_p (x) \bar{h_q(x)} dx=\int_0^T e^{i2\pi p x/T} e^{-i2\pi q x/T} dx=0

where pqp \neq q and a+biˉ=abi\bar{a+bi}=a-bi. The "length" of basis is

hp2=<hp,hp>=T |h_p|^2=<h_p,h_p>=T

If a function latexflatex f defined on interval [0,T][0,T] is within the space, it can be written as

f(x)=pcphp(x)=pcpei2πpx/T f(x)=\sum_p c_p h_p(x)=\sum_p c_p e^{i2\pi px/T}

And the coefficient can be calculated as

cp=<f,hp>hp2=1T0Tf(x)hp(x)ˉdx=1T0Tf(x)ei2πpxdx c_p=\frac{<f,h_p>}{|h_p|^2}=\frac{1}{T} \int_0^T f(x)\bar{h_p(x)}dx=\frac{1}{T} \int_0^T f(x) e^{-i2\pi p x} dx

Actually this is the Fourier series. A more thorough discussion about Fourier series can be found here.

5. Kernel Function

Instead of ii, we use yy to index the basis function. Let

K(x,y)=hy(x) K(x,y)=h_y(x)

The set of basis functions can be denoted as one function with two parameters. In the example, yy is discrete. However, this is not the full space. If we let yy to be real number, then any functions may be represented.

The function K(x,y)K(x,y) is kernel function. It may be viewed as a cluster of basis functions. If they are orthogonal <K(,y1),K(,y2)>=0<K(\cdot,y_1),K(\cdot,y_2)>=0 for any y1y2y_1 \neq y_2, and the squared "length" of basis K(,y)2=1dx|K(\cdot,y)|^2=\int 1 dx, any function ff can be represented as

f=F(y)K(,y)dy f=\int F(y)K(\cdot,y) dy

And the coefficient of a function ff on a basis K(,y)K(\cdot,y) is calculated as

F(y)=<f,K(,y)>=f(x)K(x,y)dx F(y)=<f,K(\cdot,y)>=\int f(x)K(x,y)dx

Therefore, kernel function is able to transform a function into another space (a function space).

6. Example: Transformations

Let the kernel function to be

K(x,y)=ei2πxy K(x,y)=e^{i2\pi xy}

where ii is the imaginary number. It is easy to show that the basis functions are mutual orthogonal,

<K(,y1),K(,y2)>=K(x,y1)K(x,y2)ˉdx=ei2πxy1ei2πxy2dx=0 <K(\cdot,y_1),K(\cdot,y_2)>=\int K(x,y_1) \bar{K(x,y_2)}dx=\int e^{i2\pi xy_1}e^{-i2\pi xy_2}dx=0

when y1y2y_1 \neq y_2. Then squared "length" of basis is

<K(,y),K(,y)>=K(x,y)K(x,y)ˉdx=1dx <K(\cdot,y),K(\cdot,y)>=\int K(x,y)\bar{K(x,y)} dx=\int 1 dx

This constitutes a full function space. The coefficient is calculated as

F(y)=<f,K(,y)>=f(x)K(x,y)ˉdx=f(x)ei2πxydx F(y)=<f,K(\cdot ,y)>=\int f(x) \bar{K(x,y)} dx=\int f(x)e^{-i2\pi xy}dx

The original function is represented as

f(x)=F(y)K(x,y)dy=F(y)ei2πxydy f(x)=\int F(y) K(x,y) dy=\int F(y) e^{i2\pi xy}dy

Actually this is the Fourier transformation. Other transformations such as Laplace transformation, Z-transformation are similar. Maybe wavelet is also within the framework.

7. Something about Kernel Method

For a kernel function K(x,y)K(x,y), if we regard it as a cluster of basis, then we can get a space by the linear combination of the basis. However, the basis may not be orthogonal. This is somewhat like a matrix PP, if we use the columns of PP to generate a space. To construct orthogonal basis for the space, we can use eigen value and eigen vectors of PP. Similarly, we can use eigenfunctions as the basis for the function space generated by K(x,y)K(x,y). If all the eigen values are positive, then the kernel function is positive. If the inner product is defined as

<K(,x),K(,y)>=K(x,y) <K(\cdot,x), K(\cdot,y)>=K(x,y)

This is the reproducing kernel Hilbert space. It is commonly used to calculate the inner product of two mappings in a high-dimension feature space. When linear model is not enough, we can map the points into feature space and use kernel trick to calculate inner product. Support Vector Machine is an example.