.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/datasets/plot_tecator_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_datasets_plot_tecator_regression.py: Spectrometric data: derivatives, regression, and variable selection =================================================================== Shows the use of derivatives, functional regression and variable selection for functional data. .. GENERATED FROM PYTHON SOURCE LINES 8-13 .. code-block:: Python # License: MIT # sphinx_gallery_thumbnail_number = 4 .. GENERATED FROM PYTHON SOURCE LINES 14-24 This example uses the Tecator dataset\ :footcite:`borggaard+thodberg_1992_optimal` in order to illustrate the problems of functional regression and functional variable selection. This dataset contains the spectra of absorbances of several pieces of finely chopped meat, as well as the percent of its content in water, fat and protein. This is one of the examples presented in the ICTAI conference\ :footcite:p:`ramos-carreno++_2022_scikitfda`. .. GENERATED FROM PYTHON SOURCE LINES 26-28 We will first load the Tecator data, keeping only the fat content target, and plot it. .. GENERATED FROM PYTHON SOURCE LINES 28-39 .. code-block:: Python import matplotlib.pyplot as plt from skfda.datasets import fetch_tecator X, y = fetch_tecator(return_X_y=True) y = y[:, 0] X.plot(gradient_criteria=y) plt.show() .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_001.png :alt: Spectrometric curves :srcset: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 40-44 For spectrometric data, the relevant information of the curves can often be found in the derivatives, as discussed in Ferraty and Vieu (chapter 7)\ :footcite:`ferraty+vieu_2006`. Thus, we compute numerically the second derivative and plot it. .. GENERATED FROM PYTHON SOURCE LINES 44-48 .. code-block:: Python X_der = X.derivative(order=2) X_der.plot(gradient_criteria=y) plt.show() .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_002.png :alt: Spectrometric curves :srcset: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 49-53 We first apply a simple linear regression model to compute a baseline for our regression predictions. In order to compute functional linear regression we first convert the data to a basis expansion. .. GENERATED FROM PYTHON SOURCE LINES 53-61 .. code-block:: Python from skfda.representation.basis import BSplineBasis basis = BSplineBasis( n_basis=10, ) X_der_basis = X_der.to_basis(basis) .. GENERATED FROM PYTHON SOURCE LINES 62-64 We split the data in train and test, and compute the regression score using the linear regression model. .. GENERATED FROM PYTHON SOURCE LINES 64-82 .. code-block:: Python from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split from skfda.ml.regression import LinearRegression X_train, X_test, y_train, y_test = train_test_split( X_der_basis, y, random_state=0, ) regressor = LinearRegression() regressor.fit(X_train, y_train) y_pred = regressor.predict(X_test) score = r2_score(y_test, y_pred) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9505439228770038 .. GENERATED FROM PYTHON SOURCE LINES 83-91 We now will take a different approach. It is possible to note from the plot of the derivatives that most information necessary for regression can be found at some particular "impact" points. Thus, we now apply a functional variable selection method to detect those points and use them with a multivariate classifier. The variable selection method that we employ here is maxima hunting\ :footcite:`berrendero++_2016_variable`, a filter method that computes a relevance score for each point of the curve and selects all the local maxima. .. GENERATED FROM PYTHON SOURCE LINES 91-105 .. code-block:: Python from skfda.preprocessing.dim_reduction.variable_selection.\ maxima_hunting import ( MaximaHunting, RelativeLocalMaximaSelector, ) var_sel = MaximaHunting( local_maxima_selector=RelativeLocalMaximaSelector(max_points=2), ) X_mv = var_sel.fit_transform(X_der, y) print(var_sel.indexes_) .. rst-class:: sphx-glr-script-out .. code-block:: none [41 97] .. GENERATED FROM PYTHON SOURCE LINES 106-107 We can visualize the relevance function and the selected points. .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: Python var_sel.dependence_.plot() for p in var_sel.indexes_: plt.axvline(X_der.grid_points[0][p], color="black") plt.show() .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_003.png :alt: Spectrometric curves :srcset: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 113-114 We also can visualize the selected points on the curves. .. GENERATED FROM PYTHON SOURCE LINES 114-119 .. code-block:: Python X_der.plot(gradient_criteria=y) for p in var_sel.indexes_: plt.axvline(X_der.grid_points[0][p], color="black") plt.show() .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_004.png :alt: Spectrometric curves :srcset: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 120-122 We split the data again (using the same seed), but this time without the basis expansion. .. GENERATED FROM PYTHON SOURCE LINES 122-128 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X_der, y, random_state=0, ) .. GENERATED FROM PYTHON SOURCE LINES 129-131 We now make a pipeline with the variable selection and a multivariate linear regression method for comparison. .. GENERATED FROM PYTHON SOURCE LINES 131-144 .. code-block:: Python import sklearn.linear_model from sklearn.pipeline import Pipeline pipeline = Pipeline([ ("variable_selection", var_sel), ("classifier", sklearn.linear_model.LinearRegression()), ]) pipeline.fit(X_train, y_train) y_predicted = pipeline.predict(X_test) score = r2_score(y_test, y_predicted) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9172784128792496 .. GENERATED FROM PYTHON SOURCE LINES 145-147 We can use a tree regressor instead to improve both the score and the interpretability. .. GENERATED FROM PYTHON SOURCE LINES 147-159 .. code-block:: Python from sklearn.tree import DecisionTreeRegressor pipeline = Pipeline([ ("variable_selection", var_sel), ("classifier", DecisionTreeRegressor(max_depth=3)), ]) pipeline.fit(X_train, y_train) y_predicted = pipeline.predict(X_test) score = r2_score(y_test, y_predicted) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9513642508362486 .. GENERATED FROM PYTHON SOURCE LINES 160-161 We can plot the final version of the tree to explain every prediction. .. GENERATED FROM PYTHON SOURCE LINES 161-168 .. code-block:: Python from sklearn.tree import plot_tree fig, ax = plt.subplots(figsize=(10, 10)) plot_tree(pipeline.named_steps["classifier"], precision=6, filled=True, ax=ax) plt.show() .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_005.png :alt: plot tecator regression :srcset: /auto_examples/datasets/images/sphx_glr_plot_tecator_regression_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 169-173 References ---------- .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.686 seconds) .. _sphx_glr_download_auto_examples_datasets_plot_tecator_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/datasets/plot_tecator_regression.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_tecator_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_tecator_regression.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_tecator_regression.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_