.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/representation/plot_irregular_to_basis_mixed_effects.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_representation_plot_irregular_to_basis_mixed_effects.py: Mixed effects model for irregular data ============================================================================== This example converts irregular data to a basis representation using a mixed effects model. .. GENERATED FROM PYTHON SOURCE LINES 8-13 .. code-block:: Python # Author: Pablo Cuesta Sierra # License: MIT # sphinx_gallery_thumbnail_number = -1 .. GENERATED FROM PYTHON SOURCE LINES 14-25 Sythetic data ------------- For this example, we are going to simulate the irregular sampling of a dataset following the mixed effects model, to later attempt to reconstruct said original dataset. We generate the original basis representation of the data following the mixed effects model for irregular data as presented by :footcite:t:`james_2018_sparsenessfda`. This just means that the coefficients of the basis representation are generated from a Gaussian distribution. .. GENERATED FROM PYTHON SOURCE LINES 25-51 .. code-block:: Python import numpy as np from skfda import FDataBasis from skfda.representation.basis import BSplineBasis n_curves = 70 n_basis = 4 domain_range = (0, 10) basis = BSplineBasis(n_basis=n_basis, domain_range=domain_range, order=3) coeff_mean = np.array([-15, 20, -4, 6]) coeff_cov_sqrt = np.array([ [4.0, 0.0, 0.0, 0.0], [-3.2, -2.6, 0.0, 0.0], [4.7, 2.9, 2.0, 0.0], [-1.9, 6.3, 4.6, -3.6], ]) random_state = np.random.RandomState(seed=34285676) coefficients = ( coeff_mean + random_state.normal(size=(n_curves, n_basis)) @ coeff_cov_sqrt ) fdatabasis_original = FDataBasis(basis, coefficients) .. GENERATED FROM PYTHON SOURCE LINES 57-58 Plot the basis functions used to generate the data .. GENERATED FROM PYTHON SOURCE LINES 58-66 .. code-block:: Python import matplotlib.pyplot as plt fig, ax = plt.subplots() basis.plot(axes=ax) ax.set_title("Basis functions") plt.show() .. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_001.png :alt: Basis functions :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 67-68 Plot some of the generated curves .. GENERATED FROM PYTHON SOURCE LINES 68-74 .. code-block:: Python fig, ax = plt.subplots() fdatabasis_original[:10].plot(axes=ax) ax.set_title("Original curves") plt.show() .. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_002.png :alt: Original curves :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 75-78 We subsample the original data by measuring a random number of points per curve generating an irregular dataset. Moreover, we add some Gaussian noise to the data. .. GENERATED FROM PYTHON SOURCE LINES 78-99 .. code-block:: Python from skfda import FDataIrregular from skfda.datasets import fetch_weather, irregular_sample fd_irregular_without_noise = irregular_sample( fdata=fdatabasis_original, n_points_per_curve=random_state.randint(2, 6, n_curves), random_state=random_state, ) noise_std = .3 noise = random_state.normal( 0, noise_std, fd_irregular_without_noise.values.shape, ) fd_irregular = FDataIrregular( points=fd_irregular_without_noise.points, start_indices=fd_irregular_without_noise.start_indices, values=fd_irregular_without_noise.values + noise, ) .. GENERATED FROM PYTHON SOURCE LINES 100-101 Plot 3 curves of the newly created irregular data along with the original .. GENERATED FROM PYTHON SOURCE LINES 101-117 .. code-block:: Python fig, axes = plt.subplots(1, 3, figsize=(10, 3)) for curve_idx, ax in enumerate(axes): fdatabasis_original[curve_idx].plot( axes=ax, alpha=0.3, color=f"C{curve_idx}", ) fd_irregular[curve_idx].plot( axes=ax, marker=".", color=f"C{curve_idx}", ) ax.set_ylim((-27, 27)) plt.show() .. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_003.png :alt: plot irregular to basis mixed effects :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 118-120 We split our irregular data into two groups, the train curves and the test curves. .. GENERATED FROM PYTHON SOURCE LINES 120-132 .. code-block:: Python from sklearn.model_selection import train_test_split train_original, test_original, train_irregular, test_irregular = ( train_test_split( fdatabasis_original, fd_irregular, test_size=0.3, random_state=random_state, ) ) .. GENERATED FROM PYTHON SOURCE LINES 133-138 Now, we create and train the mixed effects converter using the train curves, and we convert the irregular data to basis representation. For comparison, we also convert to basis representation using the default basis representation for each curve, which is done curve-wise instead of taking into account the whole dataset. .. GENERATED FROM PYTHON SOURCE LINES 138-156 .. code-block:: Python from skfda.representation.conversion import EMMixedEffectsConverter converter = EMMixedEffectsConverter(basis) converter.fit(train_irregular) train_converted = converter.transform(train_irregular) test_converted = converter.transform(test_irregular) train_functionwise_to_basis = train_irregular.to_basis( basis, conversion_type="function-wise", ) test_functionwise_to_basis = test_irregular.to_basis( basis, conversion_type="function-wise", ) .. GENERATED FROM PYTHON SOURCE LINES 157-159 To visualize the conversion results, we plot the first original and converted curves of the test set. .. GENERATED FROM PYTHON SOURCE LINES 159-193 .. code-block:: Python fig, axes = plt.subplots(5, 2, figsize=(11, 16)) fig.suptitle("Comparison of the original and converted data (test set)") for curve_idx, ax in enumerate(axes.flat): test_irregular[curve_idx].scatter( axes=ax, color=f"C{curve_idx}", label="Irregular", ) test_original[curve_idx].plot( axes=ax, color=f"C{curve_idx}", alpha=0.5, label="Original", ) test_functionwise_to_basis[curve_idx].plot( axes=ax, color=f"C{curve_idx}", linestyle=":", label="Function-wise", ) test_converted[curve_idx].plot( axes=ax, color=f"C{curve_idx}", linestyle="--", label="Mixed-effects", ) ax.legend() ax.set_ylim((-27, 27)) # Same scale for all plots fig.tight_layout(rect=(0, 0, 1, 0.98)) plt.show() .. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_004.png :alt: Comparison of the original and converted data (test set) :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 194-206 As can be seen in the previous plot, when measurements are distributed across the domain, both the mixed effects model and the function-wise conversion are able to provide a good approximation of the original data. However, when the measurements are concentrated in a small region of the domain, e can see that the mixed effects model is able to provide a more accurate approximation. Moreover, the mixed effects model is able to remove the noise from the measurements, which is not the case for the function-wise conversion. Finally, we make use of the :math:`R^2` score and the :math:`MSE` to compare the converted basis representations with the original data, both for the train and test sets. .. GENERATED FROM PYTHON SOURCE LINES 206-227 .. code-block:: Python import pandas as pd from skfda.misc.scoring import mean_squared_error, r2_score score_functions = {"R^2": r2_score, "MSE": mean_squared_error} scores = { score_name: pd.DataFrame({ "Mixed-effects": { "Train": score_fun(train_original, train_converted), "Test": score_fun(test_original, test_converted), }, "Curve-wise": { "Train": score_fun(train_original, train_functionwise_to_basis), "Test": score_fun(test_original, test_functionwise_to_basis), }, }) for score_name, score_fun in score_functions.items() } .. GENERATED FROM PYTHON SOURCE LINES 233-234 The :math:`R^2` scores are as follows (higher is better): .. GENERATED FROM PYTHON SOURCE LINES 234-237 .. code-block:: Python scores["R^2"] .. raw:: html
Mixed-effects Curve-wise
Train 0.938600 -60.547529
Test 0.900498 -67.636766


.. GENERATED FROM PYTHON SOURCE LINES 238-239 The MSE errors are as follows (lower is better): .. GENERATED FROM PYTHON SOURCE LINES 239-243 .. code-block:: Python scores["MSE"] .. raw:: html
Mixed-effects Curve-wise
Train 0.993315 1591.071517
Test 1.932750 253.514808


.. GENERATED FROM PYTHON SOURCE LINES 244-254 Real-world data --------------- The Canadian Weather dataset is downloaded from the package 'fda' in CRAN. It contains a FDataGrid with daily temperatures and precipitations, that is, it has a 2-dimensional image. We are interested only in the daily average temperatures, so we will use the first coordinate. As we want to illustrate the conversion of irregular data to basis, representation, we will take an irregular sample of the temperatures dataset containing only 7 points per curve. .. GENERATED FROM PYTHON SOURCE LINES 254-262 .. code-block:: Python weather = fetch_weather() fd_temperatures = weather.data.coordinates[0] random_state = np.random.RandomState(seed=73947291) irregular_temperatures = irregular_sample( fdata=fd_temperatures, n_points_per_curve=7, random_state=random_state, ) .. GENERATED FROM PYTHON SOURCE LINES 263-266 The dataset contains information about the region of each station, which have different types of climate. We save the indices of the stations in each region to later plot some of them. .. GENERATED FROM PYTHON SOURCE LINES 266-280 .. code-block:: Python regions = weather.categories["region"] print(regions) region_indexes = { region: np.nonzero(weather.target == i)[0] for i, region in enumerate(regions) } arctic_indexes = region_indexes["Arctic"] atlantic_indexes = region_indexes["Atlantic"] continental_indexes = region_indexes["Continental"] pacific_indexes = region_indexes["Pacific"] .. rst-class:: sphx-glr-script-out .. code-block:: none ['Arctic' 'Atlantic' 'Continental' 'Pacific'] .. GENERATED FROM PYTHON SOURCE LINES 281-283 Here we plot the original data alongside one of the original curves and its irregularly sampled version. .. GENERATED FROM PYTHON SOURCE LINES 283-302 .. code-block:: Python fig, axes = plt.subplots(1, 2, figsize=(10, 4)) ax = axes[0] fd_temperatures.plot(axes=ax) ylim = ax.get_ylim() ax.set_title("All temperature curves") ax = axes[1] index = 13 # index of the station fd_temperatures[index].plot(axes=ax, color="black", alpha=0.4) irregular_temperatures[index].scatter(axes=ax, color="black", marker="o") ax.set_ylim(ylim) ax.set_title( f"{fd_temperatures.sample_names[index]} station's temperature curve", ) plt.show() .. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_005.png :alt: Canadian Weather, All temperature curves, Toronto station's temperature curve :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 303-305 Now, we convert the irregularly sampled temperature curves to basis representation. Due to the periodicity of the data, a Fourier basis is used. .. GENERATED FROM PYTHON SOURCE LINES 305-316 .. code-block:: Python from skfda.representation.basis import FourierBasis basis = FourierBasis(n_basis=5, domain_range=fd_temperatures.domain_range) irregular_temperatures_converted = irregular_temperatures.to_basis( basis, conversion_type="mixed-effects", ) curvewise_temperatures_converted = irregular_temperatures.to_basis( basis, conversion_type="function-wise", ) .. GENERATED FROM PYTHON SOURCE LINES 317-320 To visualize the conversion, we now plot 4 of the converted curves (one from each region) along with the original temperatures and the irregular points that we sampled. .. GENERATED FROM PYTHON SOURCE LINES 320-352 .. code-block:: Python indexes = [ arctic_indexes[0], atlantic_indexes[11], continental_indexes[3], pacific_indexes[3], ] fig, axes = plt.subplots(2, 2, figsize=(10, 10)) for i, (index, ax) in enumerate(zip(indexes, axes.flat, strict=True)): fd_temperatures[index].plot( axes=ax, color=f"C{i}", alpha=0.5, label="Original", ) curvewise_temperatures_converted[index].plot( axes=ax, color=f"C{i}", linestyle=":", label="Function-wise", ) irregular_temperatures_converted[index].plot( axes=ax, color=f"C{i}", linestyle="--", label="Mixed-effects", ) irregular_temperatures[index].scatter( axes=ax, color=f"C{i}", alpha=0.5, label="Irregular", ) ax.set_title( f"{fd_temperatures.sample_names[index]} station " f"({weather.categories['region'][weather.target[index]]})", ) ax.set_ylim(ylim) ax.legend() fig.tight_layout() plt.show() .. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_006.png :alt: Canadian Weather, Iqaluit station (Arctic), Montreal station (Atlantic), Churchill station (Continental), Pr. George station (Pacific) :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_to_basis_mixed_effects_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 353-360 Finally, we get a score of the quality of the conversion by comparing the obtained basis representation with the original data from the CRAN dataset. The :math:`R^2` score is used. Note that, to compare the original data and the basis representation (which have different :class:`FData` types), we have to evaluate the latter at the grid points of the former. .. GENERATED FROM PYTHON SOURCE LINES 360-371 .. code-block:: Python r2_me = r2_score( fd_temperatures, irregular_temperatures_converted.to_grid(fd_temperatures.grid_points), ) r2_curvewise = r2_score( fd_temperatures, curvewise_temperatures_converted.to_grid(fd_temperatures.grid_points), ) print(f"R2 score (function-wise): {r2_curvewise:f}") print(f"R2 score (mixed-effects): {r2_me:f}") .. rst-class:: sphx-glr-script-out .. code-block:: none R2 score (function-wise): -0.445817 R2 score (mixed-effects): 0.935085 .. GENERATED FROM PYTHON SOURCE LINES 372-377 As in the synthetic case, both conversion types are similar for the curves where the measurements are distributed across the domain. Otherwise, the mixed-effects model provides a more accurate approximation in the regions where the measurements of one curve are missing by using the information from the whole dataset. .. GENERATED FROM PYTHON SOURCE LINES 379-383 References ---------- .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.811 seconds) .. _sphx_glr_download_auto_examples_representation_plot_irregular_to_basis_mixed_effects.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/representation/plot_irregular_to_basis_mixed_effects.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_irregular_to_basis_mixed_effects.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_irregular_to_basis_mixed_effects.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_irregular_to_basis_mixed_effects.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_