Skip to content

[DOC] Added Examples and Detailed Explanation for Segmentation Algorithms #2552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions examples/segmentation/binary_segmentation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "AddKF0mKttTq"
},
"source": [
"# **Binary Segmentation in Time Series**\n",
"The `BinSegSegmenter` class performs **binary segmentation**, a method to detect change points in a time series. It recursively finds and splits data at detected change points.\n",
"\n",
"\n",
"## Parameters \n",
"\n",
"### `n_cps` (`int`, default = `1`) \n",
" - Specifies the number of change points to detect. \n",
" - A higher value detects more segment boundaries. \n",
"\n",
"\n",
"### `model` (`str`, default = `\"l2\"`) \n",
" - Determines the segmentation model for detecting changes. \n",
" - Available models: \n",
" - `\"l1\"` → Detects sharp, sudden changes using absolute differences. \n",
" - `\"l2\"` → Measures gradual transitions using squared difference. \n",
" - `\"rbf\"` → Uses a Radial Basis Function (RBF) kernel to detect complex, non-linear changes. \n",
" - `\"linear\"` → Identifies changes in trends/slopes (useful for financial data). \n",
" - `\"normal\"` → Assumes a normal distribution and detects changes in mean or variance. \n",
"\n",
"\n",
"### `min_size` (`int`, default = `2`) \n",
" - Defines the smallest possible segment length after a change point is detected. \n",
" - Helps prevent very short, unreliable segments. \n",
"\n",
"\n",
"### `jump` (`int`, default = `5`) \n",
" - Determines how often the algorithm evaluates data points (subsampling). \n",
" - A higher value makes detection faster but may reduce precision. \n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "QXYWgJUBlBXk",
"outputId": "1cf10182-6561-4cb3-90fd-c27728066c93"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Detected change points: [50]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"from aeon.segmentation import BinSegSegmenter\n",
"\n",
"np.random.seed(42)\n",
"X = np.concatenate([np.random.normal(0, 1, 50), np.random.normal(5, 1, 50)])\n",
"\n",
"binseg = BinSegSegmenter(n_cps=1, model=\"l2\")\n",
"\n",
"found_cps = binseg._predict(X)\n",
"\n",
"print(\"Detected change points:\", found_cps)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GvwOKNVJlKa2"
},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
104 changes: 104 additions & 0 deletions examples/segmentation/eagglo_segmentation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# **E-Agglo: agglomerative clustering algorithm that preserves observation order.**\n",
"\n",
"E-Agglo is a non-parametric clustering approach for multivariate time series, where neighboring segments are sequentially merged to maximize a goodness-of-fit statistic.\n",
"\n",
"Unlike most general-purpose agglomerative clustering algorithms, this procedure preserves the time ordering of the observations.\n",
"\n",
"This method can detect distributional changes in an independent sequence and does not make any distributional assumptions beyond the existence of an alpha-th moment. It estimates both the number and locations of change points simultaneously.\n",
"\n",
"<br>\n",
"\n",
"## Parameters\n",
"\n",
"`member` : array_like (default=None) \n",
"Assigns points to the initial cluster membership. The first dimension should match the size of the input data. If set to None, each point is assigned to a separate cluster by default.\n",
"\n",
"`alpha` : float (default=1.0) \n",
"A fixed constant alpha in the range (0, 2), used in the divergence measure. It represents the alpha-th absolute moment.\n",
"\n",
"`penalty` : str or callable or None (default=None) \n",
"A function that penalizes the goodness-of-fit statistic to prevent overfitting. If None, no penalty is applied. Can also be one of the predefined penalty names: \n",
"- len_penalty: Penalizes based on segment length. \n",
"- mean_diff_penalty: Penalizes based on mean differences between segments.\n",
"\n",
"<br>\n",
"\n",
"## Attributes\n",
"\n",
"`merged_` : array_like\n",
"\n",
"A 2D array that records which clusters were merged at each step of the agglomerative process.\n",
"\n",
"`gof_` : float\n",
"\n",
"The goodness-of-fit statistic for the current segmentation result.\n",
"\n",
"`cluster_` : array_like\n",
"\n",
"A 1D array that specifies which cluster each row of input data X belongs to.\n",
"\n"
],
"metadata": {
"id": "ok5Oes8lYsvX"
}
},
{
"cell_type": "code",
"source": [
"from aeon.segmentation import EAggloSegmenter\n",
"from aeon.testing.data_generation import make_example_dataframe_series\n",
"\n",
"X = make_example_dataframe_series(n_channels=2, random_state=10)\n",
"model = EAggloSegmenter()\n",
"y = model.fit_predict(X, axis=0)\n",
"\n",
"print(\"Segmented clusters:\", y)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "nPwNRiecCysi",
"outputId": "25741c1e-696b-465c-d1c7-979ca66777c6"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Segmented clusters: [0 1]\n"
]
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "wNYidqUnW7qO"
},
"execution_count": 2,
"outputs": []
}
]
}
4 changes: 4 additions & 0 deletions examples/segmentation/segmentation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
"- [ClaSP (Classification Score Profile) Segmentation](./segmentation_with_clasp.ipynb)\n",
"\n",
"- [Heteregeneous Intrinsic Dimensionality Algorithm (Hidalgo) Segmentation](./hidalgo_segmentation.ipynb)\n",
"\n",
"- [Binary Segmentation](./binary_segmentation.ipynb)\n",
"\n",
"- [Agglomerative clustering algorithm](./eagglo_segmentation.ipynb)\n",
"\n"
]
}
Expand Down
Loading