Glossary of economics research
Results of search for cross-validation follow:
cross-validation:
A way of choosing the window width for a kernel estimation. The
method is to select, from a set of possible window widths, one that minimizes
the sum of errors made in predicting each data point by using kernel
regression on the others.
Formally, let J be the number of data points, j an index to each one, from one
to J, yj the dependent variable for each j, Xj the
independent variables for that j, Yj the dependent variable for
that j, and {hi} for i=1 to n the set of candidate window widths.
The hi's might be a set of equally spaced values on a grid. The
algorithm for choosing one of the hi's is:
For each candidate window width hi
{
..For each j from 1 to J
..{
....Drop the data point (Xj, Yj) from the sample
temporarily
....Run a kernel regression to estimate Yj using the remaining X's
and Y's
....Keep track of the square of the error made in that prediction
..}
..Sum the squares of the errors for every j to get a score for candidate
window width hi
..Record that in a list as the score for hi
}
Select as the outcome h of this algorithm the hi with the lowest
score
The grid approach is necessary because the problem is not concave. Otherwise
one might try a simpler maximization e.g., with the first order
conditions.
Note however that a complete execution of the cross-validation method can be
very slow because it requires as many kernel regressions as there are data
points. E.g. in this author's experience, the cross-validation computation
for one window width on 500 data points on a Pentium-90 in Gauss took about
five seconds, 1000 data points took circa seventeen seconds, but for 15000
data points it took an hour. (Then it takes another hour to check another
window width; so even the very simplest choice, between two window widths,
takes two hours.)
Source: Hardle, 1990
Contexts: nonparametrics; estimation; econometrics; statistics
Back to top