Neighbors Functional Regression#

Shows the usage of the nearest neighbors regressor with functional response.

# Author: Pablo Marcos Manchón
# License: MIT

# sphinx_gallery_thumbnail_number = 4

In this example we are going to show the usage of the nearest neighbors regressors with functional response. There is available a K-nn version, KNeighborsRegressor, and other one based in the radius, RadiusNeighborsRegressor.

As in the scalar response example, we will fetch the Canadian weather dataset, which contains the daily temperature and precipitation at 35 different locations in Canada averaged over 1960 to 1994. The following figure shows the different temperature and precipitation curves.

from skfda.datasets import fetch_weather

data = fetch_weather()
fd = data["data"]

# Split dataset, temperatures and curves of precipitation
X, y_grid = fd.coordinates

Temperatures

import matplotlib.pyplot as plt

X.plot()
plt.show()
Canadian Weather

Precipitation

y_grid.plot()
plt.show()
Canadian Weather

We will try to predict the precipitation curves. First of all we are going to make a smoothing of the precipitation curves using a basis representation, employing for it a fourier basis with 5 elements.

from skfda.representation.basis import FourierBasis

y = y_grid.to_basis(FourierBasis(n_basis=5))

y.plot()
plt.show()
Canadian Weather

We will split the dataset in two partitions, for training and test, using the sklearn function train_test_split().

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.1,
    random_state=28,
)

We will try make a prediction using 5 neighbors and the \(\mathbb{L}^2\) distance. In this case, to calculate the response we will use a mean of the response, weighted by their distance to the test sample.

from skfda.ml.regression import KNeighborsRegressor

knn = KNeighborsRegressor(n_neighbors=5, weights="distance")
knn.fit(X_train, y_train)
KNeighborsRegressor(weights='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


We can predict values for the test partition using predict(). The following figure shows the real precipitation curves, in dashed line, and the predicted ones.

y_pred = knn.predict(X_test)

# Plot prediction
fig = y_pred.plot()
fig.axes[0].set_prop_cycle(None)  # Reset colors
y_test.plot(fig=fig, linestyle="--")
plt.show()
Canadian Weather

We can quantify how much variability it is explained by the model using the score() method, which computes the value

\[1 - \frac{\sum_{i=1}^{n}\int (y_i(t) - \hat{y}_i(t))^2dt} {\sum_{i=1}^{n} \int (y_i(t)- \frac{1}{n}\sum_{i=1}^{n}y_i(t))^2dt}\]

where \(y_i\) are the real responses and \(\hat{y}_i\) the predicted ones.

score = knn.score(X_test, y_test)
print(score)
0.9149923776656348

More detailed information about the canadian weather dataset can be obtained in the following references.

  • Ramsay, James O., and Silverman, Bernard W. (2006). Functional Data Analysis, 2nd ed. , Springer, New York.

  • Ramsay, James O., and Silverman, Bernard W. (2002). Applied Functional Data Analysis, Springer, New Yorkn’

Total running time of the script: (0 minutes 0.343 seconds)

Gallery generated by Sphinx-Gallery