.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/preprocessing/plot_fdm.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_preprocessing_plot_fdm.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_preprocessing_plot_fdm.py:


Functional Diffusion Maps
============================================================================
In this example, the use of the functional diffusion map (FDM) technique is
shown over different datasets.
Firstly, an example of basic use of the technique is provided.
Later, two examples of parameter tuning are presented, for embedding spaces
of diferent dimensions.
Finally, an application of the method for a real, non-synthetic, dataset is
provided.

.. GENERATED FROM PYTHON SOURCE LINES 14-18

.. code-block:: Python


    # Author: Eduardo Terrés Caballero
    # License: MIT


.. GENERATED FROM PYTHON SOURCE LINES 19-21

Some examples shown here are further explained in the
article :footcite:t:`barroso++_2023_fdm`.

.. GENERATED FROM PYTHON SOURCE LINES 24-30

Moons dataset example
---------------------

Firstly, a basic example of execution is presented using a functional version
of the moons dataset, a dataset consisting of two dimentional coordinates
representing the position of two different moons.

.. GENERATED FROM PYTHON SOURCE LINES 30-53

.. code-block:: Python


    import matplotlib.pyplot as plt
    import numpy as np
    from matplotlib.colors import ListedColormap
    from sklearn import datasets

    seed = 612245103
    random_state = np.random.RandomState(seed)

    n_samples, n_grid_pts = 100, 50
    data_moons, y = datasets.make_moons(
        n_samples=n_samples,
        noise=0,
        random_state=random_state,
    )

    colors = ["blue", "orange"]
    cmap = ListedColormap(colors)
    fig, ax = plt.subplots()
    ax.scatter(data_moons[:, 0], data_moons[:, 1], c=y, cmap=cmap)
    ax.set_title("Moons data")
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_001.png
   :alt: Moons data
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 54-60

Using a two elements basis, the functional observation corresponding
to a multivariate observation is obtained by treating the coordinates
as coefficients that multiply the elements of the basis.
In other words, the multivariate vectors are interpreted as elements
of the basis.
Below is the code to generate the synthetic moons functional data.

.. GENERATED FROM PYTHON SOURCE LINES 60-76

.. code-block:: Python


    from skfda.representation import FDataGrid

    grid = np.linspace(-np.pi, np.pi, n_grid_pts)
    basis = np.array([np.sin(4 * grid), grid ** 2 + 2 * grid - 2])
    fd_moons = FDataGrid(
        data_matrix=data_moons @ basis,
        grid_points=grid,
        dataset_name="Functional moons data",
        argument_names=("x",),
        coordinate_names=("f (x)",),
    )
    fig = fd_moons.plot(linewidth=0.5, group=y, group_colors=colors)
    fig.axes[0].set_xlim((-np.pi, np.pi))
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_002.png
   :alt: Functional moons data
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 77-83

Once the functional data is available, it simply remains to choose
the value of the parameters of the model.

The FDM technique involves the use of a kernel operator, that acts
as a measure of similarity for the data. In this case we will be using
the Gaussian kernel, with a length scale parameter of 0.25.

.. GENERATED FROM PYTHON SOURCE LINES 83-100

.. code-block:: Python


    from skfda.misc.covariances import Gaussian
    from skfda.preprocessing.dim_reduction import DiffusionMap

    fdm = DiffusionMap(
        n_components=2,
        kernel=Gaussian(length_scale=0.25),
        alpha=0,
        n_steps=1,
    )
    embedding = fdm.fit_transform(fd_moons)

    fig, ax = plt.subplots()
    ax.scatter(embedding[:, 0], embedding[:, 1], c=y, cmap=cmap)
    ax.set_title("Diffusion coordinates for the functional moons data")
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_003.png
   :alt: Diffusion coordinates for the functional moons data
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 101-104

As we can see, the functional diffusion map has correctly interpreted
the topological nature of the data, by successfully separating the
coordinates associated to both moons.

.. GENERATED FROM PYTHON SOURCE LINES 107-118

Spirals dataset example
-----------------------

Next is an example of parameter tuning in the form of a grid
search for a set of given values for the length_scale kernel parameter
and the alpha parameter of the FDM method.
We have two spirals, which are initially entangled and thus
indistinguishable to the machine.

Below is the code for the generation of the spiral data and its
functional equivalent, following a similar dynamic as in the moons dataset.

.. GENERATED FROM PYTHON SOURCE LINES 118-148

.. code-block:: Python

    n_samples, n_grid_pts = 100, 50
    t = (np.pi / 4 + np.linspace(0, 4, n_samples)) * np.pi
    dx, dy = 10 * t * np.cos(t), 10 * t * np.sin(t)
    y = np.array([0] * n_samples + [1] * n_samples)
    data_spirals = np.column_stack((
        np.concatenate((dx, -dx)), np.concatenate((dy, -dy)),
    ))

    colors = ["yellow", "purple"]
    cmap = ListedColormap(colors)
    fig, ax = plt.subplots()
    ax.scatter(data_spirals[:, 0], data_spirals[:, 1], c=y, cmap=cmap)
    ax.set_aspect("equal", adjustable="box")
    ax.set_title("Spirals data")
    plt.show()

    # Define functional data object
    grid = np.linspace(-np.pi, np.pi, n_grid_pts)
    basis = np.array([grid * np.cos(grid) / 3, grid * np.sin(grid) / 3])
    fd_spirals = FDataGrid(
        data_matrix=data_spirals @ basis,
        grid_points=grid,
        dataset_name="Functional spirals data",
        argument_names=("x",),
        coordinate_names=("f (x)",),
    )
    fd_spirals.plot(linewidth=0.5, group=y, group_colors=colors)
    plt.show()


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_004.png
         :alt: Spirals data
         :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_004.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_005.png
         :alt: Functional spirals data
         :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_005.png
         :class: sphx-glr-multi-img


.. GENERATED FROM PYTHON SOURCE LINES 149-152

Once the functional data is ready, we will perform a grid search
for the following values of the parameters, as well as plot
the resulting embeddings for visual comparison.

.. GENERATED FROM PYTHON SOURCE LINES 152-184

.. code-block:: Python


    from itertools import product

    alpha_set = [0, 0.33, 0.66, 1]
    length_scale_set = [2.5, 3, 4.5, 7, 10, 11, 15]
    param_grid = product(alpha_set, length_scale_set)

    fig, axes = plt.subplots(
        len(alpha_set), len(length_scale_set), figsize=(16, 8),
    )

    for (alpha, length_scale), ax in zip(param_grid, axes.ravel(), strict=True):
        fdm = DiffusionMap(
            n_components=2,
            kernel=Gaussian(length_scale=length_scale),
            alpha=alpha,
            n_steps=1,
        )
        embedding = fdm.fit_transform(fd_spirals)

        ax.scatter(embedding[:, 0], embedding[:, 1], c=y, cmap=cmap)
        ax.set_xticklabels([])
        ax.set_yticklabels([])

    for ax, alpha in zip(axes[:, 0], alpha_set, strict=True):
        ax.set_ylabel(f"$\\alpha$: {alpha}", size=20, rotation=0, ha="right")

    for ax, length_scale in zip(axes[0], length_scale_set, strict=True):
        ax.set_title(f"$len-sc$: {length_scale}", size=20, va="bottom")

    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_006.png
   :alt: $len-sc$: 2.5, $len-sc$: 3, $len-sc$: 4.5, $len-sc$: 7, $len-sc$: 10, $len-sc$: 11, $len-sc$: 15
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_006.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 185-208

The first thing to notice is that the parameter length scale exerts
a greater influence in the resulting embedding than the parameter alpha.
In this sense, the figures of any given column are more similar than those
of any given row.
Thus, we shall set alpha equal to 0 because, by theory, it is
equivalent to skipping a normalization step in the process

Moreover, we can see that the optimal choice of the length scale parameter
of the kernel is 4.5 because it visually presents the more clear separation
between the trajectories of both spirals.
Hence, for a length scale of the kernel function of 4.5 the method is able
to understand the local geometry of the spirals dataset.
For a small value of the kernel parameter (for example 1) contiguous points
in the same arm of the spiral are not considered close because the kernel is
too narrow, resulting in apparently random diffusion coordinates.
For a large value of the kernel parameter (for example 15) the kernel is wide
enough so that points in contiguous spiral arms, which belong to different
trajectories, are considered similar. Hence the diffusion coordinates keep
these relations by mantaining both trajectories entagled.
In summary, for a value of length scale of 4.5 the kernel is wide enough so
that points in the same arm of a trajectory are considered similar,
but its not too wide so that points in contiguous arms of the spiral are
also considered similar.

.. GENERATED FROM PYTHON SOURCE LINES 210-214

For a reliable comparison between embeddings, it is advisable to use
the same scale in all axis.
To ilustrate this idea, next is a re-execution for the row alpha
equals 0.

.. GENERATED FROM PYTHON SOURCE LINES 214-246

.. code-block:: Python

    alpha_set = [0]
    length_scale_set = [2.5, 3, 4.5, 7, 10, 11, 15]
    param_grid = product(alpha_set, length_scale_set)

    fig, axes = plt.subplots(
        len(alpha_set),
        len(length_scale_set),
        figsize=(16, 4),
    )

    for (alpha, length_scale), ax in zip(param_grid, axes.ravel(), strict=True):
        fdm = DiffusionMap(
            n_components=2,
            kernel=Gaussian(length_scale=length_scale),
            alpha=alpha,
            n_steps=1,
        )
        embedding = fdm.fit_transform(fd_spirals)

        ax.scatter(embedding[:, 0], embedding[:, 1], c=y, cmap=cmap)
        ax.set_xlim((-0.4, 0.4))
        ax.set_ylim((-0.4, 0.4))

    axes[0].set_ylabel(
        f"$\\alpha$: {alpha_set[0]}", size=20, rotation=0, ha="right",
    )

    for ax, length_scale in zip(axes, length_scale_set, strict=True):
        ax.set_title(f"$len-sc$: {length_scale}", size=20, va="bottom")

    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_007.png
   :alt: $len-sc$: 2.5, $len-sc$: 3, $len-sc$: 4.5, $len-sc$: 7, $len-sc$: 10, $len-sc$: 11, $len-sc$: 15
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_007.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 247-260

Swiss roll dataset example
--------------------------

So far, the above examples have been computed with a value of the
``n_components`` parameter of 2. This implies that the resulting
diffusion coordinate points belong to a two-dimensional space and thus we
can provide a graphical representation.
The aim of this new section is to explore further possibilities
regarding ``n_components``.

We will now apply the method to a more complex example, the
Swiss roll dataset. This dataset consists of three dimensional points
that lay over a topological manifold shaped like a Swiss roll.

.. GENERATED FROM PYTHON SOURCE LINES 260-272

.. code-block:: Python

    n_samples, n_grid_pts = 500, 100
    data_swiss, y = datasets.make_swiss_roll(
        n_samples=n_samples,
        noise=0,
        random_state=random_state,
    )
    fig = plt.figure()
    axis = fig.add_subplot(111, projection="3d")
    axis.set_title("Swiss roll data")
    axis.scatter(data_swiss[:, 0], data_swiss[:, 1], data_swiss[:, 2], c=y)
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_008.png
   :alt: Swiss roll data
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_008.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 273-278

Similarly to the previous examples, the functional data object is defined.
In this case a three element base will be used, since the multivariate data
points belong to a three-dimensional space.
For clarity purposes, only the first fifty functional observations are
plotted.

.. GENERATED FROM PYTHON SOURCE LINES 278-292

.. code-block:: Python

    grid = np.linspace(-np.pi, np.pi, n_grid_pts)
    basis = np.array([np.sin(4 * grid), np.cos(8 * grid), np.sin(12 * grid)])
    data_matrix = np.array(data_swiss) @ basis
    fd_swiss = FDataGrid(
        data_matrix=data_matrix,
        grid_points=grid,
        dataset_name="Functional Swiss roll data",
        argument_names=("x",),
        coordinate_names=("f (x)",),
    )
    fd_swiss[:50].plot(linewidth=0.5, group=y[:50])
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_009.png
   :alt: Functional Swiss roll data
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_009.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 293-296

Now, the FDM method will be applied for different values of the
parameters, again in the form of a grid search.
Note that the diffusion coordinates will now consist of three components.

.. GENERATED FROM PYTHON SOURCE LINES 296-326

.. code-block:: Python


    alpha_set = [0, 0.5, 1]
    length_scale_set = [1.5, 2.5, 4, 5]
    param_grid = product(alpha_set, length_scale_set)

    fig, axes = plt.subplots(
        len(alpha_set),
        len(length_scale_set),
        figsize=(16, 8),
        subplot_kw={"projection": "3d"},
    )

    for (alpha, length_scale), ax in zip(param_grid, axes.ravel(), strict=True):

        fdm = DiffusionMap(
            n_components=3,
            kernel=Gaussian(length_scale=length_scale),
            alpha=alpha,
            n_steps=1,
        )
        embedding = fdm.fit_transform(fd_swiss)

        ax.scatter(embedding[:, 0], embedding[:, 1], embedding[:, 2], c=y)
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_zticklabels([])
        ax.set_title(f"$\\alpha$: {alpha}    $len-sc$: {length_scale}")

    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_010.png
   :alt: $\alpha$: 0    $len-sc$: 1.5, $\alpha$: 0    $len-sc$: 2.5, $\alpha$: 0    $len-sc$: 4, $\alpha$: 0    $len-sc$: 5, $\alpha$: 0.5    $len-sc$: 1.5, $\alpha$: 0.5    $len-sc$: 2.5, $\alpha$: 0.5    $len-sc$: 4, $\alpha$: 0.5    $len-sc$: 5, $\alpha$: 1    $len-sc$: 1.5, $\alpha$: 1    $len-sc$: 2.5, $\alpha$: 1    $len-sc$: 4, $\alpha$: 1    $len-sc$: 5
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_010.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 332-334

Let's take a closer look at the resulting embedding for a value
of length_scale and alpha equal to 2.5 and 0, respectively.

.. GENERATED FROM PYTHON SOURCE LINES 334-352

.. code-block:: Python

    alpha, length_scale = 0, 2.5
    fdm = DiffusionMap(
        n_components=3,
        kernel=Gaussian(length_scale=length_scale),
        alpha=alpha,
        n_steps=1,
    )
    embedding = fdm.fit_transform(fd_swiss)

    fig = plt.figure()
    ax = fig.add_subplot(111, projection="3d")
    ax.scatter(embedding[:, 0], embedding[:, 1], embedding[:, 2], c=y)
    ax.set_title(
        "Diffusion coordinates for \n"
        f"$\\alpha$: {alpha}    $len-sc$: {length_scale}",
    )
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_011.png
   :alt: Diffusion coordinates for  $\alpha$: 0    $len-sc$: 2.5
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_011.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 353-365

The election of the optimal parameters is relative to the problem at hand.
The goal behind choosing values of length_scale equal to 2.5 and alpha equal
to 0 is to obtain a unrolled transformation to the Swiss roll.
Note that in the roll there are pairs of points whose euclidean
distance is small but whose shortest path contained in the
manifold is significantly larger, since it must complete an entire loop.

In this sense, the process happens to have taken into account the
shortest path distance rather than the euclidean one. Thus, one may argue
that the topological nature of the data has been respected.
This new diffusion coordinates could be useful to gain more insights into
the initial data through further analysis.

.. GENERATED FROM PYTHON SOURCE LINES 367-375

Real dataset: phoneme
---------------------

The aim of this section is to provide an example of application of
the FDM method to a non-synthetic dataset.
Below is an example of execution using the phoneme dataset,
a dataset consisting of the computed log-periodogram for five distinct
phonemes coming from recorded male speech from the TIMIT database.

.. GENERATED FROM PYTHON SOURCE LINES 375-394

.. code-block:: Python


    from skfda.datasets import fetch_phoneme

    n_samples = 300
    colors = ["C0", "C1", "C2", "C3", "C4"]
    group_names = ["aa", "ao", "dcl", "iy", "sh"]

    # Fetch phoneme dataset
    fd_phoneme, y = fetch_phoneme(return_X_y=True)
    fd_phoneme, y = fd_phoneme[:n_samples], y[:n_samples]
    fd_phoneme.plot(
        linewidth=0.7,
        group=y,
        group_colors=colors,
        group_names=group_names,
        legend=True,
    )
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_012.png
   :alt: Phoneme
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_012.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 395-397

The resulting diffusion coordinates in three dimensions will be
plotted, using different views to better understand the plot.

.. GENERATED FROM PYTHON SOURCE LINES 397-426

.. code-block:: Python

    cmap = ListedColormap(colors)
    alpha, length_scale = 1, 10
    fdm = DiffusionMap(
        n_components=3,
        kernel=Gaussian(length_scale=length_scale),
        alpha=alpha,
        n_steps=1,
    )
    diffusion_coord = fdm.fit_transform(fd_phoneme)

    # Plot three views of the diffusion coordinates
    view_points = [(30, 70), (28, 0), (10, -120)]

    fig, axes = plt.subplots(
        1, len(view_points), figsize=(18, 6), subplot_kw={"projection": "3d"},
    )

    for view, ax in zip(view_points, axes.ravel(), strict=True):
        ax.scatter(
            diffusion_coord[:, 0],
            diffusion_coord[:, 1],
            diffusion_coord[:, 2],
            c=y,
            cmap=cmap,
        )
        ax.view_init(*view)
        ax.set_title(f"View {view}", fontsize=26)
    plt.show()


.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_013.png
   :alt: View (30, 70), View (28, 0), View (10, -120)
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_fdm_013.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 430-436

We can see that the diffusion coordinates for the different phonemes
have been clustered in the 3D space. This representation enables a
more clear separation of the data into the different phoneme groups.
In this way, the phonemes groups that are similar to each other, namely /aa/
and /ao/ are closer in the space. In fact, these two groups partly overlap
(orange and blue).

.. GENERATED FROM PYTHON SOURCE LINES 439-443

References
----------

    .. footbibliography::


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 4.308 seconds)


.. _sphx_glr_download_auto_examples_preprocessing_plot_fdm.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/preprocessing/plot_fdm.py
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_fdm.ipynb <plot_fdm.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_fdm.py <plot_fdm.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_fdm.zip <plot_fdm.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_