.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ml/plot_neighbors_scalar_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ml_plot_neighbors_scalar_regression.py: Neighbors Scalar Regression =========================== Shows the usage of the nearest neighbors regressor with scalar response. .. GENERATED FROM PYTHON SOURCE LINES 7-13 .. code-block:: Python # Author: Pablo Marcos Manchón # License: MIT # sphinx_gallery_thumbnail_number = 3 .. GENERATED FROM PYTHON SOURCE LINES 14-29 In this example, we are going to show the usage of the nearest neighbors regressors with scalar response. There is available a K-nn version, :class:`KNeighborsRegressor `, and other one based in the radius, :class:`RadiusNeighborsRegressor `. Firstly we will fetch a dataset to show the basic usage. The Canadian weather dataset contains the daily temperature and precipitation at 35 different locations in Canada averaged over 1960 to 1994. The following figure shows the different temperature and precipitation curves. .. GENERATED FROM PYTHON SOURCE LINES 30-40 .. code-block:: Python from skfda.datasets import fetch_weather data = fetch_weather() fd = data["data"] # Split dataset, temperatures and curves of precipitation X, y_func = fd.coordinates .. GENERATED FROM PYTHON SOURCE LINES 41-42 Temperatures .. GENERATED FROM PYTHON SOURCE LINES 42-48 .. code-block:: Python import matplotlib.pyplot as plt X.plot() plt.show() .. image-sg:: /auto_examples/ml/images/sphx_glr_plot_neighbors_scalar_regression_001.png :alt: Canadian Weather :srcset: /auto_examples/ml/images/sphx_glr_plot_neighbors_scalar_regression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 49-50 Precipitation .. GENERATED FROM PYTHON SOURCE LINES 50-54 .. code-block:: Python y_func.plot() plt.show() .. image-sg:: /auto_examples/ml/images/sphx_glr_plot_neighbors_scalar_regression_002.png :alt: Canadian Weather :srcset: /auto_examples/ml/images/sphx_glr_plot_neighbors_scalar_regression_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 55-58 We will try to predict the total log precipitation, i.e, :math:`logPrecTot_i = \log \sum_{t=0}^{365} prec_i(t)` using the temperature curves. .. GENERATED FROM PYTHON SOURCE LINES 59-68 .. code-block:: Python import numpy as np # Sum directly from the data matrix prec = y_func.data_matrix.sum(axis=1)[:, 0] log_prec = np.log(prec) print(log_prec) .. rst-class:: sphx-glr-script-out .. code-block:: none [7.30033776 7.28276118 7.29600641 7.14084916 7.0914925 7.02811278 6.6861106 6.79860983 6.83668883 7.09721794 7.01148446 6.84673058 6.81640724 6.66262171 6.86484778 6.5572044 6.23284087 6.10724558 6.01322604 5.91647157 6.0078299 5.89357605 6.14246742 5.99271377 5.60543435 7.0519422 6.74711693 6.41165405 7.86010789 5.60469852 5.79209856 5.59136005 6.02707297 5.56106617 4.9698133 ] .. GENERATED FROM PYTHON SOURCE LINES 69-72 As in the nearest neighbors classifier examples, we will split the dataset in two partitions, for training and test, using the sklearn function :func:`~sklearn.model_selection.train_test_split`. .. GENERATED FROM PYTHON SOURCE LINES 73-82 .. code-block:: Python from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, log_prec, random_state=7, ) .. GENERATED FROM PYTHON SOURCE LINES 83-92 Firstly we will try make a prediction with the default values of the estimator, using 5 neighbors and the :math:`\mathbb{L}^2` distance. We can fit the :class:`~skfda.ml.regression.KNeighborsRegressor` in the same way than the sklearn estimators. This estimator is an extension of the sklearn :class:`~sklearn.neighbors.KNeighborsRegressor`, but accepting a :class:`~skfda.representation.grid.FDataGrid` as input instead of an array with multivariate data. .. GENERATED FROM PYTHON SOURCE LINES 93-99 .. code-block:: Python from skfda.ml.regression import KNeighborsRegressor knn = KNeighborsRegressor(weights="distance") knn.fit(X_train, y_train) .. raw:: html
KNeighborsRegressor(weights='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 111-113 We can predict values for the test partition using :meth:`~skfda.ml.regression.KNeighborsScalarRegressor.predict`. .. GENERATED FROM PYTHON SOURCE LINES 114-118 .. code-block:: Python pred = knn.predict(X_test) print(pred) .. rst-class:: sphx-glr-script-out .. code-block:: none [7.11225785 5.99768933 7.05559273 6.88718564 6.78535172 5.97132028 6.56125279 6.47991884 6.92965595] .. GENERATED FROM PYTHON SOURCE LINES 119-121 The following figure compares the real precipitations with the predicted values. .. GENERATED FROM PYTHON SOURCE LINES 122-132 .. code-block:: Python fig, ax = plt.subplots() ax.scatter(y_test, pred) ax.plot(y_test, y_test) ax.set_xlabel("Total log precipitation") ax.set_ylabel("Prediction") plt.show() .. image-sg:: /auto_examples/ml/images/sphx_glr_plot_neighbors_scalar_regression_003.png :alt: plot neighbors scalar regression :srcset: /auto_examples/ml/images/sphx_glr_plot_neighbors_scalar_regression_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 133-140 We can quantify how much variability it is explained by the model with the coefficient of determination :math:`R^2` of the prediction, using :meth:`~skfda.ml.regression.KNeighborsScalarRegressor.score` for that. The coefficient :math:`R^2` is defined as :math:`(1 - u/v)`, where :math:`u` is the residual sum of squares :math:`\sum_i (y_i - y_{pred_i})^ 2` and :math:`v` is the total sum of squares :math:`\sum_i (y_i - \bar y )^2`. .. GENERATED FROM PYTHON SOURCE LINES 141-147 .. code-block:: Python score = knn.score(X_test, y_test) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.92445585715156 .. GENERATED FROM PYTHON SOURCE LINES 148-158 In this case, we obtain a really good aproximation with this naive approach, although, due to the small number of samples, the results will depend on how the partition was done. In the above case, the explained variation is inflated for this reason. We will perform cross-validation to test more robustly our model. Also, we can make a grid search, using :class:`~sklearn.model_selection.GridSearchCV`, to determine the optimal number of neighbors and the best way to weight their votes. .. GENERATED FROM PYTHON SOURCE LINES 159-176 .. code-block:: Python from sklearn.model_selection import GridSearchCV param_grid = { "n_neighbors": range(1, 12, 2), "weights": ["uniform", "distance"], } knn = KNeighborsRegressor() gscv = GridSearchCV( knn, param_grid, cv=5, ) gscv.fit(X, log_prec) .. raw:: html
GridSearchCV(cv=5, estimator=KNeighborsRegressor(),
                 param_grid={'n_neighbors': range(1, 12, 2),
                             'weights': ['uniform', 'distance']})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 177-178 We obtain that 7 is the optimal number of neighbors. .. GENERATED FROM PYTHON SOURCE LINES 179-184 .. code-block:: Python print("Best params", gscv.best_params_) print("Best score", gscv.best_score_) .. rst-class:: sphx-glr-script-out .. code-block:: none Best params {'n_neighbors': 3, 'weights': 'distance'} Best score -2.521109652461066 .. GENERATED FROM PYTHON SOURCE LINES 185-193 More detailed information about the Canadian weather dataset can be obtained in the following references. * Ramsay, James O., and Silverman, Bernard W. (2006). Functional Data Analysis, 2nd ed. , Springer, New York. * Ramsay, James O., and Silverman, Bernard W. (2002). Applied Functional Data Analysis, Springer, New York\n' .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.363 seconds) .. _sphx_glr_download_auto_examples_ml_plot_neighbors_scalar_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/ml/plot_neighbors_scalar_regression.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_neighbors_scalar_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_neighbors_scalar_regression.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_neighbors_scalar_regression.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_