.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/representation/plot_irregular_mixed_effects_robustness.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_representation_plot_irregular_mixed_effects_robustness.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_representation_plot_irregular_mixed_effects_robustness.py:


Mixed effects model for irregular data: robustness of the conversion
====================================================================

This example converts irregular data to a basis representation using a mixed
effects model and checks the robustness of the method by fitting
the model with decreasing number of measurement points per curve.

.. GENERATED FROM PYTHON SOURCE LINES 9-14

.. code-block:: Python

    # Author: Pablo Cuesta Sierra
    # License: MIT

    # sphinx_gallery_thumbnail_number = -1


.. GENERATED FROM PYTHON SOURCE LINES 15-23

For this example, we are going to check the robustness of
the mixed effects method for converting irregular data to basis
representation by removing some measurement points from the test and train
sets and comparing the resulting conversions.

The temperatures from the Canadian weather dataset are used to generate
the irregular data.
We use a Fourier basis due to the periodic nature of the data.

.. GENERATED FROM PYTHON SOURCE LINES 23-32

.. code-block:: Python


    import matplotlib.pyplot as plt

    from skfda.datasets import fetch_weather
    from skfda.representation.basis import FourierBasis

    fd_temperatures = fetch_weather().data.coordinates[0]
    basis = FourierBasis(n_basis=5, domain_range=fd_temperatures.domain_range)


.. GENERATED FROM PYTHON SOURCE LINES 33-34

We plot the original data and the basis functions.

.. GENERATED FROM PYTHON SOURCE LINES 34-51

.. code-block:: Python


    fig, axes = plt.subplots(1, 2, figsize=(10, 4))

    ax = axes[0]
    fd_temperatures.plot(axes=ax)
    ylim = ax.get_ylim()
    xlabel = ax.get_xlabel()
    ax.set_title(fd_temperatures.dataset_name)

    ax = axes[1]
    basis.plot(axes=ax)
    ax.set_xlabel(xlabel)
    ax.set_title("Basis functions")

    fig.suptitle("")
    plt.show()


.. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_001.png
   :alt: , Canadian Weather, Basis functions
   :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 52-53

We split the data into train and test sets:

.. GENERATED FROM PYTHON SOURCE LINES 53-67

.. code-block:: Python


    import numpy as np
    from sklearn.model_selection import train_test_split

    from skfda import FDataIrregular

    random_state = np.random.RandomState(seed=13627798)
    train_original, test_original = train_test_split(
        fd_temperatures,
        test_size=0.3,
        random_state=random_state,
    )
    test_original_irregular = FDataIrregular.from_fdatagrid(test_original)


.. GENERATED FROM PYTHON SOURCE LINES 68-70

Then, we create datasets with decreasing number of measurement points per
curve, by removing measurement points from the previous dataset iteratively.

.. GENERATED FROM PYTHON SOURCE LINES 70-93

.. code-block:: Python


    from skfda.datasets import irregular_sample

    n_points_list = [365, 40, 10, 7, 5, 4, 3]
    train_irregular_datasets = []
    test_irregular_datasets = []
    current_train = train_original
    current_test = test_original
    for n_points in n_points_list:
        current_train = irregular_sample(
            current_train,
            n_points_per_curve=n_points,
            random_state=random_state,
        )
        current_test = irregular_sample(
            current_test,
            n_points_per_curve=n_points,
            random_state=random_state,
        )
        train_irregular_datasets.append(current_train)
        test_irregular_datasets.append(current_test)


.. GENERATED FROM PYTHON SOURCE LINES 94-95

We now define with measures will we track for score or error.

.. GENERATED FROM PYTHON SOURCE LINES 95-103

.. code-block:: Python


    from skfda.misc.scoring import mean_squared_error, r2_score

    score_functions = {
        "R^2": r2_score,
        "MSE": mean_squared_error,
    }


.. GENERATED FROM PYTHON SOURCE LINES 111-116

We convert the irregular data to basis representation and compute the scores.
To do so, we fit the converter once per train set. After fitting the
the converter with a train set that has :math:`k` points per curve, we
use it to transform that train set, the test set with :math:`k` points per
curve and the original test set with 365 points per curve.

.. GENERATED FROM PYTHON SOURCE LINES 116-165

.. code-block:: Python


    import pandas as pd

    from skfda.representation.conversion import EMMixedEffectsConverter

    converter = EMMixedEffectsConverter(basis)


    # Store the converted data
    converted_data = {
        "Train-sparse": [],
        "Test-sparse": [],
        "Test-original": [],
    }
    for train_irregular, test_irregular in zip(
        train_irregular_datasets,
        test_irregular_datasets,
        strict=True,
    ):
        converter = converter.fit(train_irregular)

        converted_data["Train-sparse"].append(
            converter.transform(train_irregular),
        )
        converted_data["Test-sparse"].append(
            converter.transform(test_irregular),
        )
        converted_data["Test-original"].append(
            converter.transform(test_original_irregular),
        )

    # Calculate and store the scores
    scores = {
        score_name: pd.DataFrame(
            {
                data_name: [
                    score_fun(
                        test_original if "Test" in data_name else train_original,
                        transformed.to_grid(test_original.grid_points),
                    )
                    for transformed in converted_data[data_name]
                ]
                for data_name in converted_data
            },
            index=pd.Index(n_points_list, name="Points per curve"),
        )
        for score_name, score_fun in score_functions.items()
    }


.. GENERATED FROM PYTHON SOURCE LINES 171-173

Finally, we have the scores for the train and test sets with decreasing
number of measurement points per curve.

.. GENERATED FROM PYTHON SOURCE LINES 175-176

The :math:`R^2` scores are as follows (higher is better):

.. GENERATED FROM PYTHON SOURCE LINES 176-179

.. code-block:: Python


    scores["R^2"]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Train-sparse</th>
          <th>Test-sparse</th>
          <th>Test-original</th>
        </tr>
        <tr>
          <th>Points per curve</th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>365</th>
          <td>0.974127</td>
          <td>0.962706</td>
          <td>0.963052</td>
        </tr>
        <tr>
          <th>40</th>
          <td>0.972310</td>
          <td>0.959470</td>
          <td>0.963067</td>
        </tr>
        <tr>
          <th>10</th>
          <td>0.957704</td>
          <td>0.934373</td>
          <td>0.961971</td>
        </tr>
        <tr>
          <th>7</th>
          <td>0.934244</td>
          <td>0.909790</td>
          <td>0.959539</td>
        </tr>
        <tr>
          <th>5</th>
          <td>0.755150</td>
          <td>0.706064</td>
          <td>0.904735</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0.439997</td>
          <td>-0.423861</td>
          <td>0.466959</td>
        </tr>
        <tr>
          <th>3</th>
          <td>-0.081926</td>
          <td>-0.332551</td>
          <td>-0.066834</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 180-181

The MSE errors are as follows (lower is better):

.. GENERATED FROM PYTHON SOURCE LINES 181-185

.. code-block:: Python


    scores["MSE"]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Train-sparse</th>
          <th>Test-sparse</th>
          <th>Test-original</th>
        </tr>
        <tr>
          <th>Points per curve</th>
          <th></th>
          <th></th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>365</th>
          <td>0.852184</td>
          <td>1.114603</td>
          <td>1.103542</td>
        </tr>
        <tr>
          <th>40</th>
          <td>0.920573</td>
          <td>1.281008</td>
          <td>1.103568</td>
        </tr>
        <tr>
          <th>10</th>
          <td>1.485674</td>
          <td>2.449478</td>
          <td>1.124169</td>
        </tr>
        <tr>
          <th>7</th>
          <td>2.905242</td>
          <td>3.261692</td>
          <td>1.172009</td>
        </tr>
        <tr>
          <th>5</th>
          <td>12.189931</td>
          <td>14.698684</td>
          <td>2.454010</td>
        </tr>
        <tr>
          <th>4</th>
          <td>22.179968</td>
          <td>35.746229</td>
          <td>16.787738</td>
        </tr>
        <tr>
          <th>3</th>
          <td>48.384424</td>
          <td>56.912991</td>
          <td>45.795726</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 186-187

Plot the scores.

.. GENERATED FROM PYTHON SOURCE LINES 187-226

.. code-block:: Python


    label_train = r"$\mathcal{D}_{train}^{\ j}$"
    label_test = r"$\mathcal{D}_{test}^{\ j}$"
    label_test_orig = r"$\mathcal{D}_{test}^{\ 0}$"

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    for i, (score_name, values) in enumerate(scores.items()):
        ax = axes[i]
        ax.plot(
            values.index,
            values["Train-sparse"],
            label=f"Fit {label_train}; transform {label_train}",
            marker=".",
        )
        ax.plot(
            values.index,
            values["Test-sparse"],
            label=f"Fit {label_train}; transform {label_test}",
            marker=".",
        )
        ax.plot(
            values.index,
            values["Test-original"],
            label=f"Fit {label_train}; transform {label_test_orig}",
            marker=".",
        )
        if score_name == "MSE":
            ax.set_yscale("log")
            ax.set_ylabel(f"${score_name}$ score (logscale)")
        else:
            ax.set_ylabel(f"${score_name}$ score")

        ax.set_xscale("log")
        ax.set_xlabel(r"Measurements per function (logscale)")
        ax.legend()

    plt.show()


.. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_002.png
   :alt: plot irregular mixed effects robustness
   :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 227-229

Show the original curves along with the converted
test curves for the conversions with 7, 5, 4 and 3 points per curve.

.. GENERATED FROM PYTHON SOURCE LINES 229-286

.. code-block:: Python


    def plot_conversion_evolution(index: int) -> None:
        """Plot evolution of the conversion for a particular curve."""
        fig, axes = plt.subplots(2, 2, figsize=(8, 8.5))
        start_index = 3
        for i, n_points_per_curve in enumerate(n_points_list[start_index:]):

            ax = axes.flat[i]

            test_irregular_datasets[i + start_index][index].scatter(
                axes=ax,
                color="C0",
            )
            fd_temperatures.mean().plot(
                axes=ax,
                color=[0.4] * 3,
                label="Original dataset mean",
            )
            fd_temperatures.plot(
                axes=ax,
                color=[0.7] * 3,
                linewidth=0.2,
            )
            test_original[index].plot(
                axes=ax,
                color="C0",
                linewidth=0.65,
                label="Original test curve",
            )
            converted_data["Test-sparse"][i + start_index][index].plot(
                axes=ax,
                color="C0",
                linestyle="--",
                label="Test curve transformed",
            )
            ax.set_title(
                f"Transform of test curves with {n_points_per_curve} points",
            )
            ax.set_ylim(ylim)

        fig.suptitle(
            "Evolution of the conversion of a curve with decreasing measurements "
            f"({test_original.sample_names[index]} station)",
        )

        # Add common legend at the bottom:
        handles, labels = ax.get_legend_handles_labels()
        fig.tight_layout(h_pad=0, rect=(0, 0.1, 1, 1))
        fig.legend(
            handles=handles,
            loc="lower center",
            ncols=3,
        )

        plt.show()


.. GENERATED FROM PYTHON SOURCE LINES 287-288

Toronto station's temperature curve conversion evolution:

.. GENERATED FROM PYTHON SOURCE LINES 288-290

.. code-block:: Python

    plot_conversion_evolution(index=7)


.. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_003.png
   :alt: Evolution of the conversion of a curve with decreasing measurements (Toronto station), Transform of test curves with 7 points, Transform of test curves with 5 points, Transform of test curves with 4 points, Transform of test curves with 3 points
   :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 291-292

Iqaluit station's temperature curve conversion evolution:

.. GENERATED FROM PYTHON SOURCE LINES 292-294

.. code-block:: Python

    plot_conversion_evolution(index=8)


.. image-sg:: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_004.png
   :alt: Evolution of the conversion of a curve with decreasing measurements (Iqaluit station), Transform of test curves with 7 points, Transform of test curves with 5 points, Transform of test curves with 4 points, Transform of test curves with 3 points
   :srcset: /auto_examples/representation/images/sphx_glr_plot_irregular_mixed_effects_robustness_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 295-301

As can be seen in the figures, the fewer the measurements, the closer
the converted curve is to the mean of the original dataset.
This leads us to believe that when the amount of measurements is too low,
the mixed-effects model is able to capture the general trend of the data,
but it is not able to properly capture the individual variation of each
curve.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 19.843 seconds)


.. _sphx_glr_download_auto_examples_representation_plot_irregular_mixed_effects_robustness.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/representation/plot_irregular_mixed_effects_robustness.py
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_irregular_mixed_effects_robustness.ipynb <plot_irregular_mixed_effects_robustness.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_irregular_mixed_effects_robustness.py <plot_irregular_mixed_effects_robustness.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_irregular_mixed_effects_robustness.zip <plot_irregular_mixed_effects_robustness.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_