Estimate Optimal Kernel BandWidth

Optimal kernel bandwith parameter \(\gamma\) estimation for the RBF (Gaussian) kernel in the form of \(K(x_i, x_j) = \exp[-\frac{(x_i-x_j)^2}{\gamma}]\).

Description

Given a \(\{\mathbf{x}_i\}_{i=1}^{N}, \mathbf{x}_i\in \mathbb{R}^{d}\) independent, identically distributed sample from an unknown continuous probability density function (pdf) \(f\) , the kernel density estimator for \(d=1\) is defined as

\[\hat{f}(x;h) = \frac{1}{Nh} \sum_{i=1}^{N} K \left[ \frac{x-x_i}{h} \right]\]

which becoms

\[\hat{f}_G(x;h) = \frac{1}{N\sqrt{2\pi h^2}} \sum_{i=1}^{N} \exp \left[ -\frac{(x-x_i)^2}{2h^2} \right]\]

when using a Gaussian kernel \(K(\cdot) = K_G(\cdot)\) in the form of (standard normal pdf)

\[K_G(\alpha) = \frac{1}{\sqrt{2\pi}} \exp \left[ -\frac{\alpha^2}{2} \right]\]

with \(\alpha=(x-x_i)/h\) and our bandwidth parameter \(\gamma = 2h^2\) (h is the standard deviation).

When estimating the value of the smoothing parameter \(h\) , the most common optimality criterion is the expected \(L_2\) risk function or Mean Integrated Squered Error (MISE)

\[\texttt{MISE}[\hat{f}_G(x;h)] = \mathbb{E}_f \left\{ \int \left[ \hat{f}_G(x;h) - f(x) \right]^2 \right\}\]

The \(\texttt{MISE}\) has a complicated dependence on \(f\) and \(h\) that significantly simplifies when its asymptotic form (\(N\to\infty\)), the \(\texttt{AMISE}\) is considered. The (asymptotically) optimal bandwidth is deteremined by finding the minimum \(\texttt{AMISE}\) (assuming twice continuously differentiable generating density function \(f\)) [1] .

Available algorithms

The following two algorithms are available (taken from \(\texttt{KDEpy}\) [4] ) for estimating the kernel bandwidth parameter for RBF kernels along the single dimensions (1D):

  • Silverman - Silverman’s rule of thumb : if the underlying, unknown generating density is Gaussian and Gaussian kernel is used to approximate it, then the \(\texttt{MISE}\) optimal bandwith estimate [5]

    \[h = \left[ \frac{4\hat{\sigma}^5}{3N} \right]^{\frac{1}{5}} \approx 1.06\hat{\sigma}N^{-\frac{1}{5}}\]

    which can be make more robut (for long-tailed, skewed and bimodal mixture distributions) one can replace the estimated standard deviation \(\hat{\sigma}\) with the Interquantile Range (\(IQR = Q_3-Q_1\) i.e. the difference between the upper and lower quartiles) in case \(\hat{\sigma} > \texttt{IQR}/1.34\) A further modification is to multiply the above estimate by 0.85 that yields

    \[h = 0.85 \left[ \frac{4 \text{min}[ \hat{\sigma}, \texttt{IQR} ] ^5}{3N} \right]^{\frac{1}{5}} \approx 0.9 \text{min}[ \hat{\sigma}, \texttt{IQR} ] N^{-\frac{1}{5}}\]

    Note

    Using the above Silverman rule of thumb can lead to very inaccurate estimate when the underlying estimated density is not Gaussian.

  • ISJ - Improved Sheather Jones (ISJ) : the ISJ algorithm is a plug-in method: an initial pilot estimate of the underlying density is plugged-in to the asymptotic form of the MISE to obtain an estimate then the optimal bandwidth is determine by optimising this estimate (see more [2] ). The ISJ algorithm utilises a recursive formula to minimise the asymptotic MISE for determining the optimal bandwidth [1] .

    Note

    This algorithm outperforms the simple Silverman rule and provides very good results for Gaussian kernel bandwidth estimate even in the case of bimodal data. Unlike the Silverman rule, this doesn’t assume normality. It might have convergence problem in case of insufficient numebr of data.

Example:

Example for calling the EstimateBW driver for a data.dat file estimating the optimal Gaussian kernel bandwidth h

$ python estimateBW.py data.dat 'Silverman'

estimateBW.EstimateBW(data, which)[source]

Esimates the optimal kernel bandwidth \(h=2\sigma^2\) for Gaussian kernel.

Parameters
  • fname (str) – input-matrix file (see more at the Example)

  • which (str) – type of the optimal bandwith estimation algorithms (Silverman or ISJ (default))

Returns

A numpy::ndarray that contains the estimated bandwidths along each dimensions.

The file contains the data set for which the optimal kernel bandwidth parameter needs to be determined. It is assumed, that the file contains the \(\{\mathbf{x}_i\}_{i=1}^{N}, \mathbf{x}_i\in \mathbb{R}^{d}\) input data as \(\mathbb{R}^{(N\times d)}\) matrix i.e. each data point is a row.

Loads the input data file and executes the selected algorithm to estimate the optimal kernel bandwidth along each of the \(d\) dimensions of the input data.

estimateBW.main()[source]