A Summary of State-space Models

1. Bayesian Filter

In this article, I summarize some famous state-space models. Here I won't go into details but focus on the entire map to get an overview. All these state-space models originate from Bayesian filter. In these models, two stochastic processes are considered. The first process is states xtx_t, and the second process is observations or measurements yty_t, where tt means "time" but generally the processes are not restricted to time series. We are interested in the true value of states, but we can only observe the value of observations. Therefore, state-space models aim to estimate states based on observations. Two relationships should be addressed:

  • The state-to-state probability P(xtx1:t1)P(x_t | x_{1:t-1})
  • The state-to-observation probability P(ytxt)P(y_t | x_t) No direct relationship exists between any two observations. A common assumption is the Markov property, which assumes that the current state depends only on the previous state, namely P(xtx1:t1)=P(xtxt1)P(x_t | x_{1:t-1})=P(x_t|x_{t-1}).

2. Prediction and Updating

State-space model has online algorithms with recursive two steps. Prediction is to estimate the posterior distribution p(xty1:t1)p(x_t | y_{1:t-1}) based on the distribution p(xt1y1:t1)p(x_{t-1}| y_{1:t-1}), according to the state-to-state probability P(xtxt1)P(x_t|x_{t-1}). Mathematically,

p(xty1:t1)=p(xtxt1)p(xt1y1:t1)dxt1 p(x_t|y_{1:t-1})=\int p(x_t|x_{t-1}) p(x_{t-1}|y_{1:t-1}) dx_{t-1}

Updating is to update the previous distribution based on the latest observation yty_t. Mathematically,

p(xty1:t)=p(ytxt)p(xty1:t1)/p(yt)p(ytxt)p(xty1:t1) p(x_t | y_{1:t})= p(y_t|x_t) p(x_t|y_{1:t-1}) /p(y_t) \propto p(y_t|x_t) p(x_t|y_{1:t-1})

3. Considerations in Modeling

Bayesian filters estimate xtx_t by the posterior distribution p(xty1:t)p(x_t | y_{1:t}). Usually the state-to-state probability and state-to-observation probability cannot be obtained directly when modeling practical problems. Instead, they should be inferred from prediction model xt=f(xt1)x_t=f(x_{t-1}) and measurement model yt=g(xt)y_t=g(x_t). And a series of questions must be answered:

  • Is state xtx_t discrete or continuous?
  • What is the distribution of xtx_t?
  • Is the prediction model linear or nonlinear?
  • Is the measurement model linear or nonlinear? According to different answers to these questions, we have different filters as follows.

4. Classification of Bayesian filters

Based on whether the state xtx_t is discrete or continuous, Bayesian filters are divided into discrete filters and continuous filters. When state xtx_t can only be discrete values, the state-to-state probability can be expressed by transition matrix A=[ai,j]A=[a_{i,j}] where ai,j=P(xt=jxt1=i)a_{i,j}=P(x_t =j|x_{t-1}=i).

Based on whether the distribution of xtx_t is assumed to be a specific format, continuous Bayesian filters are divided into parametric and nonparametric filters. For example , in Gaussian filters, the distribution of xtx_t is assumed to be multivariate normal distribution. With this assumption, the posterior distribution p(xty1:t)p(x_t|y_{1:t}) can be expressed in close-form explicitly. On the other side, non-parametric filters don't make any assumptions in the distribution of xtx_t, but use some techniques to approximate the distribution. For example, the distribution of xtx_t can be expressed by a histogram (Histogram filter) or a lot of samples (Particle filter) drawn from the target distribution. Non-parametric filters approximate the distribution, and put no restrictions on prediction model xt=f(xt1)x_t=f(x_{t-1}) and measurement model yt=g(xt)y_t=g(x_t), thus flexible in various situations. However, the computation load is heavy since there is no close-form expression, and the better of the approximation, the heavier of the computation burden.

Gaussian filters assume the distribution of xtx_t to be multivariate normal distribution. In classical Kalman filter, the prediction model xt=f(xt1)x_t=f(x_{t-1}) and the measurement model yt=g(xt)y_t=g(x_t) are assumed linear in order to maintain normality. Specifically, xt=Axt1+ϵt,yt=Bxt+δtx_t=A x_{t-1}+ \epsilon_t, y_t=B x_t + \delta_t. Derivatives of Kalman filter such as Extended Kalman filter and Uncented Kalman filter relax the linear relationship assumption, but approximate by linearization techniques such as Taylor expansion. Information filter and its derivatives are essentially the same to Kalman filter family, with information expression of multivariate normal distribution Ω=Σ1,ξ=Ωμ\Omega=\Sigma^{-1}, \xi=\Omega \mu.

Hybrid filters are mixture of parametric and non-parametric filters, with some dimensions of state assumed to be in specific format and other dimensions to be expressed in non-parametric techniques.