Tuesday, January 29, 2008

using kd-trees for interpolation.

I have been looking for ways to interpolate big model results to other big mode results.. again. This seems to be an unsolved problem in ocean modeling. However, I have stumbled on a new method for finding nearest neighbors that seems quite promising.

As long as you are interpolating from something that has a relatively uniform grid with a relatively smooth field, you don't need to bring out the big guns, like optimal interpolation. In other words, you may often ignore data error in your interpolation - especially going from one model to another to generate initial or boundary conditions.

[Kd-trees](http://en.wikipedia.org/wiki/Kd-tree) (k-dimensional trees) are a method for organizing a set of k-dimensional points. In particular, I would like to find the nearest neighbors in three-dimensions. Many interpolation tools (such as [csa](http://www.marine.csiro.au/~sak007/) or [delaunay](http://scipy.org/scipy/scikits/browser/trunk/delaunay/) only work with 2D data.

The advantage of using kd-trees is twofold: First storing the tree is not very memory intensive, approximately the same size as the original point field itself, and building the tree scales roughly as the number of points (n log(n)). Second, searching for the closest points to a query point is quite fast (log(n)). The [ann library](http://www.cs.umd.edu/~mount/ANN/) can be used to return indicies of and distances to the M closest points to a query point. An example of how to use the ann library for interpolation is below the fold.







from numpy import *
from scikits.ann import kdtree

from time import time

N = 5e4
Ni = 5e4
dimensions = 3
k = 5

PLOT = False

t1 = time()

# generate random points and data
pts = random.rand(N, dimensions)
d = 100.0*exp( - ((pts-0.5)**2/10.0).sum(axis=-1) )

# generate kdtree object
t2 = time()
tree = kdtree(pts)

# points to interpolate to
ptsi = random.rand(Ni, dimensions)/2.0 + 0.25
t3 = time()

# find nearest points to interpolation points...
idx, d2 = tree.knn(ptsi)
t4 = time()

# Generate weights and derive interpolated data
w = (1.0/d2)/sum((1.0/d2), axis=-1)[:, newaxis]
di = sum(w * d[idx], axis=-1)

t5 = time()


di_true = 100.0*exp( - ((ptsi-0.5)**2/10.0).sum(axis=-1) )

stderr = (di_true - di).std()

print 'Standard error = ', stderr
print 'Relative error = ', stderr/di_true.ptp()

print 'make point array ', t2-t1, ' seconds'
print 'make kdtree ', t3-t2, ' seconds'
print 'get idx and d2 ', t4-t3, ' seconds'
print 'generate weights and distances ', t5-t4, ' seconds'
print ''
print 'TOTAL kdTree time :: ', t3-t2+t4-t3, ' seconds'


This generates the output (on my MacBook Pro):

Standard error = 0.0455685814703
Relative error = 0.024867638928
make point array 0.0316381454468 seconds
make kdtree 0.0745379924774 seconds
get idx and d2 0.135274887085 seconds
generate weights and distances 0.0220251083374 seconds

TOTAL kdTree time :: 0.209812879562 seconds


All in all, pretty fast, and pretty efficient.

No comments:

Post a Comment