aeon-toolkit · sumana-2705 · Feb 20, 2025 · Feb 20, 2025
diff --git a/examples/segmentation/binary_segmentation.ipynb b/examples/segmentation/binary_segmentation.ipynb
@@ -0,0 +1,100 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "AddKF0mKttTq"
+   },
+   "source": [
+    "# **Binary Segmentation in Time Series**\n",
+    "The `BinSegSegmenter` class performs **binary segmentation**, a method to detect change points in a time series. It recursively finds and splits data at detected change points.\n",
+    "\n",
+    "\n",
+    "## Parameters  \n",
+    "\n",
+    "### `n_cps` (`int`, default = `1`)  \n",
+    "  - Specifies the number of change points to detect.  \n",
+    "  - A higher value detects more segment boundaries.  \n",
+    "\n",
+    "\n",
+    "### `model` (`str`, default = `\"l2\"`)  \n",
+    "  - Determines the segmentation model for detecting changes.  \n",
+    "  - Available models:  \n",
+    "    - `\"l1\"` → Detects sharp, sudden changes using absolute differences.  \n",
+    "    - `\"l2\"` → Measures gradual transitions using squared difference.  \n",
+    "    - `\"rbf\"` → Uses a Radial Basis Function (RBF) kernel to detect complex, non-linear changes.  \n",
+    "    - `\"linear\"` → Identifies changes in trends/slopes (useful for financial data).  \n",
+    "    - `\"normal\"` → Assumes a normal distribution and detects changes in mean or variance.  \n",
+    "\n",
+    "\n",
+    "### `min_size` (`int`, default = `2`)  \n",
+    "  - Defines the smallest possible segment length after a change point is detected.  \n",
+    "  - Helps prevent very short, unreliable segments.  \n",
+    "\n",
+    "\n",
+    "### `jump` (`int`, default = `5`)  \n",
+    "  - Determines how often the algorithm evaluates data points (subsampling).  \n",
+    "  - A higher value makes detection faster but may reduce precision.  \n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "QXYWgJUBlBXk",
+    "outputId": "1cf10182-6561-4cb3-90fd-c27728066c93"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Detected change points: [50]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "from aeon.segmentation import BinSegSegmenter\n",
+    "\n",
+    "np.random.seed(42)\n",
+    "X = np.concatenate([np.random.normal(0, 1, 50), np.random.normal(5, 1, 50)])\n",
+    "\n",
+    "binseg = BinSegSegmenter(n_cps=1, model=\"l2\")\n",
+    "\n",
+    "found_cps = binseg._predict(X)\n",
+    "\n",
+    "print(\"Detected change points:\", found_cps)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "GvwOKNVJlKa2"
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/examples/segmentation/eagglo_segmentation.ipynb b/examples/segmentation/eagglo_segmentation.ipynb
@@ -0,0 +1,104 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "name": "python3",
+   "display_name": "Python 3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# **E-Agglo:  agglomerative clustering algorithm that preserves observation order.**\n",
+    "\n",
+    "E-Agglo is a non-parametric clustering approach for multivariate time series, where neighboring segments are sequentially merged to maximize a goodness-of-fit statistic.\n",
+    "\n",
+    "Unlike most general-purpose agglomerative clustering algorithms, this procedure preserves the time ordering of the observations.\n",
+    "\n",
+    "This method can detect distributional changes in an independent sequence and does not make any distributional assumptions beyond the existence of an alpha-th moment. It estimates both the number and locations of change points simultaneously.\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "## Parameters\n",
+    "\n",
+    "`member` : array_like (default=None)  \n",
+    "Assigns points to the initial cluster membership. The first dimension should match the size of the input data. If set to None, each point is assigned to a separate cluster by default.\n",
+    "\n",
+    "`alpha` : float (default=1.0)  \n",
+    "A fixed constant alpha in the range (0, 2), used in the divergence measure. It represents the alpha-th absolute moment.\n",
+    "\n",
+    "`penalty` : str or callable or None (default=None)  \n",
+    "A function that penalizes the goodness-of-fit statistic to prevent overfitting. If None, no penalty is applied. Can also be one of the predefined penalty names:  \n",
+    "- len_penalty: Penalizes based on segment length.  \n",
+    "- mean_diff_penalty: Penalizes based on mean differences between segments.\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "## Attributes\n",
+    "\n",
+    "`merged_` : array_like\n",
+    "\n",
+    "A 2D array that records which clusters were merged at each step of the agglomerative process.\n",
+    "\n",
+    "`gof_` : float\n",
+    "\n",
+    "The goodness-of-fit statistic for the current segmentation result.\n",
+    "\n",
+    "`cluster_` : array_like\n",
+    "\n",
+    "A 1D array that specifies which cluster each row of input data X belongs to.\n",
+    "\n"
+   ],
+   "metadata": {
+    "id": "ok5Oes8lYsvX"
+   }
+  },
+  {
+   "cell_type": "code",
+   "source": [
+    "from aeon.segmentation import EAggloSegmenter\n",
+    "from aeon.testing.data_generation import make_example_dataframe_series\n",
+    "\n",
+    "X = make_example_dataframe_series(n_channels=2, random_state=10)\n",
+    "model = EAggloSegmenter()\n",
+    "y = model.fit_predict(X, axis=0)\n",
+    "\n",
+    "print(\"Segmented clusters:\", y)"
+   ],
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "nPwNRiecCysi",
+    "outputId": "25741c1e-696b-465c-d1c7-979ca66777c6"
+   },
+   "execution_count": 2,
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Segmented clusters: [0 1]\n"
+     ]
+    }
+   ]
+  },
+  {
+   "cell_type": "code",
+   "source": [],
+   "metadata": {
+    "id": "wNYidqUnW7qO"
+   },
+   "execution_count": 2,
+   "outputs": []
+  }
+ ]
+}
diff --git a/examples/segmentation/segmentation.ipynb b/examples/segmentation/segmentation.ipynb
@@ -25,6 +25,10 @@
     "- [ClaSP (Classification Score Profile) Segmentation](./segmentation_with_clasp.ipynb)\n",
     "\n",
     "- [Heteregeneous Intrinsic Dimensionality Algorithm (Hidalgo) Segmentation](./hidalgo_segmentation.ipynb)\n",
+    "\n",
+    "- [Binary Segmentation](./binary_segmentation.ipynb)\n",
+    "\n",
+    "- [Agglomerative clustering algorithm](./eagglo_segmentation.ipynb)\n",
     "\n"
    ]
   }