diff --git a/examples/segmentation/binary_segmentation.ipynb b/examples/segmentation/binary_segmentation.ipynb new file mode 100644 index 0000000000..ea1b0843e3 --- /dev/null +++ b/examples/segmentation/binary_segmentation.ipynb @@ -0,0 +1,100 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "AddKF0mKttTq" + }, + "source": [ + "# **Binary Segmentation in Time Series**\n", + "The `BinSegSegmenter` class performs **binary segmentation**, a method to detect change points in a time series. It recursively finds and splits data at detected change points.\n", + "\n", + "\n", + "## Parameters \n", + "\n", + "### `n_cps` (`int`, default = `1`) \n", + " - Specifies the number of change points to detect. \n", + " - A higher value detects more segment boundaries. \n", + "\n", + "\n", + "### `model` (`str`, default = `\"l2\"`) \n", + " - Determines the segmentation model for detecting changes. \n", + " - Available models: \n", + " - `\"l1\"` → Detects sharp, sudden changes using absolute differences. \n", + " - `\"l2\"` → Measures gradual transitions using squared difference. \n", + " - `\"rbf\"` → Uses a Radial Basis Function (RBF) kernel to detect complex, non-linear changes. \n", + " - `\"linear\"` → Identifies changes in trends/slopes (useful for financial data). \n", + " - `\"normal\"` → Assumes a normal distribution and detects changes in mean or variance. \n", + "\n", + "\n", + "### `min_size` (`int`, default = `2`) \n", + " - Defines the smallest possible segment length after a change point is detected. \n", + " - Helps prevent very short, unreliable segments. \n", + "\n", + "\n", + "### `jump` (`int`, default = `5`) \n", + " - Determines how often the algorithm evaluates data points (subsampling). \n", + " - A higher value makes detection faster but may reduce precision. \n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QXYWgJUBlBXk", + "outputId": "1cf10182-6561-4cb3-90fd-c27728066c93" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Detected change points: [50]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "from aeon.segmentation import BinSegSegmenter\n", + "\n", + "np.random.seed(42)\n", + "X = np.concatenate([np.random.normal(0, 1, 50), np.random.normal(5, 1, 50)])\n", + "\n", + "binseg = BinSegSegmenter(n_cps=1, model=\"l2\")\n", + "\n", + "found_cps = binseg._predict(X)\n", + "\n", + "print(\"Detected change points:\", found_cps)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GvwOKNVJlKa2" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/examples/segmentation/eagglo_segmentation.ipynb b/examples/segmentation/eagglo_segmentation.ipynb new file mode 100644 index 0000000000..b4aecb98c2 --- /dev/null +++ b/examples/segmentation/eagglo_segmentation.ipynb @@ -0,0 +1,104 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# **E-Agglo: agglomerative clustering algorithm that preserves observation order.**\n", + "\n", + "E-Agglo is a non-parametric clustering approach for multivariate time series, where neighboring segments are sequentially merged to maximize a goodness-of-fit statistic.\n", + "\n", + "Unlike most general-purpose agglomerative clustering algorithms, this procedure preserves the time ordering of the observations.\n", + "\n", + "This method can detect distributional changes in an independent sequence and does not make any distributional assumptions beyond the existence of an alpha-th moment. It estimates both the number and locations of change points simultaneously.\n", + "\n", + "
\n", + "\n", + "## Parameters\n", + "\n", + "`member` : array_like (default=None) \n", + "Assigns points to the initial cluster membership. The first dimension should match the size of the input data. If set to None, each point is assigned to a separate cluster by default.\n", + "\n", + "`alpha` : float (default=1.0) \n", + "A fixed constant alpha in the range (0, 2), used in the divergence measure. It represents the alpha-th absolute moment.\n", + "\n", + "`penalty` : str or callable or None (default=None) \n", + "A function that penalizes the goodness-of-fit statistic to prevent overfitting. If None, no penalty is applied. Can also be one of the predefined penalty names: \n", + "- len_penalty: Penalizes based on segment length. \n", + "- mean_diff_penalty: Penalizes based on mean differences between segments.\n", + "\n", + "
\n", + "\n", + "## Attributes\n", + "\n", + "`merged_` : array_like\n", + "\n", + "A 2D array that records which clusters were merged at each step of the agglomerative process.\n", + "\n", + "`gof_` : float\n", + "\n", + "The goodness-of-fit statistic for the current segmentation result.\n", + "\n", + "`cluster_` : array_like\n", + "\n", + "A 1D array that specifies which cluster each row of input data X belongs to.\n", + "\n" + ], + "metadata": { + "id": "ok5Oes8lYsvX" + } + }, + { + "cell_type": "code", + "source": [ + "from aeon.segmentation import EAggloSegmenter\n", + "from aeon.testing.data_generation import make_example_dataframe_series\n", + "\n", + "X = make_example_dataframe_series(n_channels=2, random_state=10)\n", + "model = EAggloSegmenter()\n", + "y = model.fit_predict(X, axis=0)\n", + "\n", + "print(\"Segmented clusters:\", y)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nPwNRiecCysi", + "outputId": "25741c1e-696b-465c-d1c7-979ca66777c6" + }, + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Segmented clusters: [0 1]\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "wNYidqUnW7qO" + }, + "execution_count": 2, + "outputs": [] + } + ] +} diff --git a/examples/segmentation/segmentation.ipynb b/examples/segmentation/segmentation.ipynb index 34473843d0..865d1466d5 100644 --- a/examples/segmentation/segmentation.ipynb +++ b/examples/segmentation/segmentation.ipynb @@ -25,6 +25,10 @@ "- [ClaSP (Classification Score Profile) Segmentation](./segmentation_with_clasp.ipynb)\n", "\n", "- [Heteregeneous Intrinsic Dimensionality Algorithm (Hidalgo) Segmentation](./hidalgo_segmentation.ipynb)\n", + "\n", + "- [Binary Segmentation](./binary_segmentation.ipynb)\n", + "\n", + "- [Agglomerative clustering algorithm](./eagglo_segmentation.ipynb)\n", "\n" ] }