diff --git a/docs/mentoring.md b/docs/mentoring.md index a937b42986..13d1c8600c 100644 --- a/docs/mentoring.md +++ b/docs/mentoring.md @@ -1,15 +1,17 @@ -# Mentoring and Projects +[//]: # (Try to put references in harvard style for consistency.) -`aeon` runs a range of short to medium duration projects interacting with the community -and the code +# aeon projects: ongoing or potential + +`aeon` runs a range of short to medium duration projects that involve +developing or using aeon and interacting with the community and the code base. These projects are designed for internships, usage as part of -undergraduate/postgraduate projects at academic institutions, and as options for -programs such as [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com/). +undergraduate/postgraduate projects at academic institutions, options for +programs such as [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com/) or just for personal side projects. For those interested in undertaking a project outside these scenarios, we recommend joining the [Slack](https://join.slack.com/t/aeon-toolkit/shared_invite/zt-22vwvut29-HDpCu~7VBUozyfL_8j3dLA) -and discussing with the project mentors. We aim to run schemes to +and discussing with the community. We aim to run schemes to help new contributors to become more familiar with `aeon`, time series machine learning research, and open-source software development. @@ -18,7 +20,7 @@ majority of them will require some knowledge of machine learning and time series ## Current aeon projects -This is a list of some of the projects we are interested in running in 2024. Feel +This is a list of some of the projects we are interested in running in 2024/25. Feel free to propose your own project ideas, but please discuss them with us first. We have an active community of researchers and students who work on `aeon`. Please get in touch via Slack if you are interested in any of these projects or have any questions. @@ -31,36 +33,20 @@ to open source. We list projects by time series task [Classification](#classification) 1. Optimizing the Shapelet Transform for classification and similarity search -2. EEG classification with aeon-neuro (Listed for GSoC 2024) -3. Improved Proximity Forest for classification (listed for GSoC 2024) -4. Improved HIVE-COTE implementation. -5. Compare distance based classification. - -[Forecasting](#forecasting) -1. Machine Learning for Time Series Forecasting (listed in GSoC 2024) -2. Deep Learning for Time Series Forecasting -3. Implement ETS forecasters in aeon +2. Improved HIVE-COTE implementation +3. Compare distance based classification. [Clustering](#clustering) 1. Density peaks clustering algorithm -2. Deep learning based clustering algorithms - -[Anomaly Detection](#anomaly-detection) -1. Anomaly detection with the Matrix Profile and MERLIN [Segmentation](#segmentation) 1. Time series segmentation [Transformation](#transformation) 1. Improve ROCKET family of transformers -2. Implement channel selection algorithms [Visualisation](#visualisation) -1. Explainable AI with the shapelet transform (Southampton intern project). - -[Regression](#regression) -1. Adapt forecasting regressors to time series extrinsic regression. -2. Adapt HIVE-COTE for regression. +1. Explainable AI with the shapelet transform [Documentation](#documentation) 1. Improve automated API documentation @@ -119,10 +105,6 @@ shapelets. 7. Benchmark the implementation against the original shapelet transform algorithm. 8. If time, generalize this new algorithm to the case of dilated shapelets (see [5]). -##### Expected Outcomes - -We expect the mentee to engage with the aeon community and produce a more performant -implementation for the shapelet transform that gets accepted into the toolkit. ##### References @@ -145,119 +127,7 @@ transform: A new approach for time series shapelets. In International Conference Pattern Recognition and Artificial Intelligence (pp. 653-664). Cham: Springer International Publishing. -#### 2. EEG classification with aeon-neuro (Listed for GSoC 2024) - -Mentors: Tony Bagnall ({user}`TonyBagnall`) and Aiden Rushbrooke - -##### Related Issues -[#18](https://github.com/aeon-toolkit/aeon-neuro/issues/18) -[#19](https://github.com/aeon-toolkit/aeon-neuro/issues/19) -[#24](https://github.com/aeon-toolkit/aeon-neuro/issues/24) - - - -##### Description - -EEG (Electroencephalogram) data are high dimensional time series that are used in -medical, psychology and brain computer interface research. For example, EEG are -used to detect epilepsy and to control devices such as mice. There is a huge body -of work on analysing and learning from EEG, but there is a wide disparity of -tools, practices and systems used. This project will help members of the `aeon` -team who are currently researching techniques for EEG classification [1] and -developing an aeon sister toolkit, [``aeon-neuro``](https://github.com/aeon-toolkit/aeon-neuro). We will work together to -improve the structure and documentation for aeon-neuro, help integrate the -toolkit with existing EEG toolkits such as MNE [2], provide interfaces to standard data -formats such as BIDS [3] and help develop and assess a range of EEG classification -algorithms. - -##### Project stages - -1. Learn about aeon best practices, coding standards and testing policies. -2. Study the existing techniques for EEG classification. -3. Implement or wrap standard EEG processing algorithms. -4. Evaluate aeon classifiers for EEG problems. -5. Implement alternatives transformations for preprocessing EEG data. -6. Help write up results for a technical report/academic paper (depending on outcomes). - -##### Expected Outcomes - -We would expect a better documented and more integrated aeon-neuro toolkit with -better functionality and a wider appeal. - -##### References - -1. Aiden Rushbrooke, Jordan Tsigarides, Saber Sami, Anthony Bagnall, -Time Series Classification of Electroencephalography Data, IWANN 2023. -2. MNE Toolkit, https://mne.tools/stable/index.html -3. The Brain Imaging Data Structure (BIDS) standard, https://bids.neuroimaging.io/ - -#### 3. Improved Proximity Forest for classification (listed for GSoC 2024) - -Mentors: Matthew Middlehurst ({user}`MatthewMiddlehurst`) and Tony Bagnall -({user}`TonyBagnall`) - -##### Related Issues -[#159](https://github.com/aeon-toolkit/aeon/issues/159) -[#428](https://github.com/aeon-toolkit/aeon/issues/428) - - -##### Description - -Distance-based classifiers such as k-Nearest Neighbours are popular approaches to time -series classification. They primarily use elastic distance measures such as Dynamic Time -Warping (DTW) to compare two series. The Proximity Forest algorithm [1] is a -distance-based classifier for time series. The classifier creates a forest of decision -trees, where the tree splits are based on the distance between time series using -various distance measures. A recent review of time series classification algorithms [2] -found that Proximity Forest was the most accurate distance-based algorithm of those -compared. - -`aeon` previously had an implementation of the Proximity Forest algorithm, but it was -not as accurate as the original implementation (the one used in the study) and was -unstable on benchmark datasets. The goal of this project is to significantly overhaul -the previous implementation or completely re-implement Proximity Forest in `aeon` to -match the accuracy of the original algorithm. This will involve comparing against the -authors' Java implementation of the algorithm as well as alternate Python versions. -The mentors will provide results for both for alternative methods. While knowing -Java is not a requirement for this project, it could be beneficial. - -Recently, the group which published the algorithm has proposed a new version of the -Proximity Forest algorithm, Proximity Forest 2.0 [3]. This algorithm is more accurate -than the original Proximity Forest algorithm, and does not currently have an -implementation in `aeon` or elsewhere in Python. If time allows, the project could also -involve implementing and evaluating the Proximity Forest 2.0 algorithm. - -##### Project stages - -1. Learn about `aeon` best practices, coding standards and testing policies. -2. Study the Proximity Forest algorithm and previous `aeon` implementation. -3. Improve/re-implement the Proximity Forest implementation in `aeon`, with -the aim being to have an implementation that is as accurate as the original algorithm, -while remaining feasible to run. -4. Evaluate the improved implementation against the original `aeon` Proximity Forest -and the authors' Java implementation. -5. If time, implement the Proximity Forest 2.0 algorithm and repeat the above -evaluation. - -##### Expected Outcomes - -We expect the mentee engage with the aeon community and produce a high quality -implementation of the Proximity Forest algorithm(s) that gets accepted into the toolkit. - -##### References - -1. Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, -B., Petitjean, F. and Webb, G.I., 2019. Proximity forest: an effective and scalable -distance-based classifier for time series. Data Mining and Knowledge Discovery, 33(3), -pp.607-635. -2. Middlehurst, M., Schäfer, P. and Bagnall, A., 2023. Bake off redux: a review and -experimental evaluation of recent time series classification algorithms. arXiv preprint -arXiv:2304.13029. -3. Herrmann, M., Tan, C.W., Salehi, M. and Webb, G.I., 2023. Proximity Forest 2.0: A -new effective and scalable similarity-based classifier for time series. arXiv -preprint arXiv:2304.05800. - -#### 4. Improved HIVE-COTE implementation +#### 2. Improved HIVE-COTE implementation Mentors: Matthew Middlehurst ({user}`MatthewMiddlehurst`) and Tony Bagnall ({user}`TonyBagnall`) @@ -300,7 +170,7 @@ alternative structures. This can easily develop into a research project. experimental evaluation of recent time series classification algorithms. arXiv preprint arXiv:2304.13029. -#### 5. Compare distance based classification and regression +#### 3. Compare distance based classification and regression Mentors: Chris Holder ({user}`cholder`) and Tony Bagnall ({user}`TonyBagnall`) @@ -323,107 +193,6 @@ implementing alternative distance functions and comparing performance on the UCR datasets. -### Forecasting - -#### 1. Machine Learning for Time Series Forecasting (listed in GSoC 2024) - -Mentors: Tony Bagnall ({user}`TonyBagnall`) and Matthew Middlehurst (@MatthewMiddlehurst). - -##### Related Issues -[#265](https://github.com/aeon-toolkit/aeon/issues/265) - - -##### Description - -This project will investigate algorithms for forecasting based on traditional machine -learning (tree based) and time series machine learning (transformation based). Note -this project will not involve deep learning based forecasting. It will involve -helping develop the `aeon` framework to work more transparently with ML algorithms, -evaluating regression algorithms already in `aeon`[1] for forecasting problems and -implementing at least one algorithm from the literature not already in aeon, such as -SETAR-Tree [3]. - -##### Project Stages -1. Learn about aeon best practices, coding standards and testing policies. -2. Adapt the M competition set up [2] for ML experimental framework to assess time - series regression algorithms [1]. -3. Implement a machine learning forecasting algorithm [3] - -##### Expected Outcomes - -1. Contributions to the aeon forecasting module. -2. Implementation of a machine learning forecasting algorithms. -3. Help write up results for a technical report/academic paper (depending on outcomes). - -##### Skills Required - -1. Python 3 -2. Git and GitHub -3. Some machine learning and/or forecasting background (e.g. taught courses or - practical experience) - -##### References - -1. Guijo-Rubio, D.,Middlehurst, M., Arcencio, G., Furtado, D. and Bagnall, A. -Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression, -arXiv2305.01429, 2023 -2. https://forecasters.org/resources/time-series-data/ -3. Godahewa, R., Webb, G.I., Schmidt, D. et al. SETAR-Tree: a novel and accurate -tree algorithm for global time series forecasting. Mach Learn 112, 2555–2591 (2023). -https://link.springer.com/article/10.1007/s10994-023-06316-x - -#### 2. Deep Learning for Time Series Forecasting - -Mentors: Tony Bagnall ({user}`TonyBagnall`) and Ali Ismail-Fawaz ({user} -`hadifawaz1999`) - -##### Description - -Deep learning has become incredibly popular for forecasting, see [1] for an -introduction. This project will involve taking one or more recently proposed -algorithms, implementing them in aeon, then performing an extensive experimental -comparison against traditional and machine learning algorithms. As part of this, we -will collate results from the M Competitions [2] - -##### Project Stages -1. Learn about aeon best practices, coding standards and testing policies. -2. Adapt the M competition set up [2] for deep learning. -3. Implement a deep learning forecasting algorithm after discussion with mentors. - -##### Expected Outcomes - -1. Collated M competition results and partial reproduction. -2. Extend the forecasting module to include at least one deep forecaster. - -##### References - -1. [ECML 2024 Tutorial](https://lovvge.github.io/Forecasting-Tutorial-ECML-2023/) -2. [M Competitions](https://forecasters.org/resources/time-series-data/) - - -#### 3. Implement ETS forecasters - -Mentors: Tony Bagnall ({user}`TonyBagnall`) and Leo Tsaprounis ({user}`ltsaprounis`) -Exponential smoothing (ETS) is a popular family of algorithms for forecasting, and -the ETS framework by Hyndman et al. [1] covers 30 possible models for time series -with different types of Error, Trend, and Seasonal components. -we already have an (Auto)ETS model in aeon, but it’s wrapping statsmodels. We would -like our own bespoke, optimised implementation based on the R implementation. - -##### Project Stages -1. Learn about aeon best practices, coding standards and testing policies. -2. Survey and benchmark existing implementations of ETS forecasting. -3. Implement basic implementations optimised for numba. -4. Extended implementation to include modern refinements. - - -##### References - -1. Hydman et al. [Forecasting with Exponential Smoothing The State Space Approach](https://link.springer.com/book/10.1007/978-3-540-71918-2) -2. [Smooth R Package](https://github.com/config-i1/smooth) -3. Svetunkov, [Forecasting and Analytics with the Augmented Dynamic Adaptive Model - (ADAM)](https://openforecast.org/adam/) - ### Clustering #### 1. Density peaks clustering algorithm @@ -466,92 +235,14 @@ Sci Rep 12, 1409 (2022) [DOI](https://doi.org/10.1038/s41598-021-02038-z) 3. Begum et al. A General Framework for Density Based Time Series Clustering Exploiting a Novel Admissible Pruning Strategy, [arXiv](https://arxiv.org/ftp/arxiv/papers/1612/1612.00637.pdf) - -#### 2. Deep learning for clustering - -Mentors: Tony Bagnall ({user}`TonyBagnall`) and Ali Ismail-Fawaz ({user} -`hadifawaz1999`) - -The clustering module in `aeon`, up until now, primarily consists of distance-based -partitional clustering algorithms. Recently, we introduced a deep clustering module, -incorporating distance-based algorithms in the latent space. - -The objective of this project is to enhance `aeon` by incorporating more deep learning -approaches for time series clustering. The specific goal is to implement and assess -InceptionTime [1] and its recent variants as a clustering algorithm, and contribute to -an ongoing collaborative effort into a bake off for clustering. More widely, there -are a broad range of deep learning clustering approaches we could consider [2]. - -##### Project Stages - -1. Research and understand clustering time series and deep learning based approaches. -2. Implement inception time as an aeon clusterer. -3. Compare performance of deep learning clusterers to distance based algorithms. - -[1] Fawaz et al. InceptionTime: Finding AlexNet for time series classification -Published: 07 September 2020 Volume 34, pages 1936–1962, (2020) -[2] Deep learning forecasting [tutorial](https://lovvge.github.io/Forecasting-Tutorial-ECML-2023/) - ### Anomaly detection -#### 1. Anomaly detection with the Matrix Profile and MERLIN - -Mentors: Matthew Middlehurst ({user}`MatthewMiddlehurst`) - -##### Description - -`aeon` is looking to extend its module for time series anomaly detection. The -end goal of this project is to implement the Matrix Profile [1][2] and MERLIN [3] -algorithms, but suitable framework for anomaly detection in `aeon` will need to be -designed first. The mentee will help design the API for the anomaly detection module -and implement the Matrix Profile and MERLIN algorithms. - -Usage of external libraries such as `stumpy` [4] is possible for the algorithm -implementations, or the mentee can implement the algorithms from scratch using `numba`. -There is also scope to benchmark the implementations, but as there is no existing -anomaly detection module in `aeon`, this will require some infrastructure to be -developed and is subject to time and interest. - -##### Project stages - -1. Learn about `aeon` best practices, coding standards and testing policies. -2. Familiarise yourself with similar single series experimental modules in `aeon` such -as segmentation and similarity search. -3. Help design the API for the anomaly detection module. -4. Study and implement the Matrix Profile for anomaly detection and MERLIN algorithms -using the new API. -5. If time allows and there is interest, benchmark the implementations against the -original implementations or other anomaly detection algorithms. - -##### Project Outcome - -As the anomaly detection is a new module in `aeon`, there is very little existing code -to compare against and little infrastructure to evluate anomaly detection algorithms. -The success of the project will be evaluated by the quality of the code produced and -engagement with the project and the `aeon` community. - -##### References - -1. Yeh, C.C.M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H.A., Silva, D.F., -Mueen, A. and Keogh, E., 2016, December. Matrix profile I: all pairs similarity joins -for time series: a unifying view that includes motifs, discords and shapelets. In 2016 -IEEE 16th international conference on data mining (ICDM) (pp. 1317-1322). Ieee. -2. Lu, Y., Wu, R., Mueen, A., Zuluaga, M.A. and Keogh, E., 2022, August. -Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints -and ultra-fast arriving data streams. In Proceedings of the 28th ACM SIGKDD Conference -on Knowledge Discovery and Data Mining (pp. 1173-1182). -3. Nakamura, T., Imamura, M., Mercer, R. and Keogh, E., 2020, November. Merlin: -Parameter-free discovery of arbitrary length anomalies in massive time series archives. -In 2020 IEEE international conference on data mining (ICDM) (pp. 1190-1195). IEEE. -4. Law, S.M., 2019. STUMPY: A powerful and scalable Python library for time series data -mining. Journal of Open Source Software, 4(39), p.1504. - ### Segmentation #### 1. Time series segmentation -Mentors: Tony Bagnall ({user}`TonyBagnall`) and TBC +Mentors: Tony Bagnall ({user}`TonyBagnall`) ##### Description @@ -568,12 +259,6 @@ https://github.com/aeon-toolkit/aeon/issues/948 4. Implement tools for comparing segmentation algorithms 5. Conduct a bake off of segmentation algorithms on a range of datasets. -##### Project Outcome - -As with all research programming based projects, progress can be hindered by many -unforseen circumstances. Success will be measured by engagement, effort and -willingness to join the community rather than performance of the algorithms. - ##### References 1. Allegra, M., Facco, E., Denti, F., Laio, A. and Mira, A., 2020. Data segmentation @@ -615,7 +300,7 @@ other time series tasks. `aeon` has implementations of the ROCKET transformation and its variants, including MiniROCKET [2] and MultiROCKET [3]. However, these implementations have room for improvement ([#208](https://github.com/aeon-toolkit/aeon/issues/208)). There is scope -to speed up the implementations, and the amount of varients is likely unnecessary and +to speed up the implementations, and the amount of variants is likely unnecessary and could be condensed into higher quality estimators. This projects involves improving the existing ROCKET implementations in `aeon` or @@ -643,14 +328,6 @@ mentee with the `aeon` pull request process. 6. Benchmark the implementation against the original ROCKET implementations, looking at booth speed of the transform and accuracy in a classification setting. -##### Project Outcomes - -Success of the project will be assessed by the quality of the code produced and an -evaluation of the transformers in a classification setting. None of the implementations -should significantly degrade the performance of the original ROCKET algorithm in terms -of accuracy and speed. Regardless, effort and engagement with the project and the -`aeon` community are more important factors in evaluating success. - ##### References 1. Dempster, A., Petitjean, F. and Webb, G.I., 2020. ROCKET: exceptionally fast and @@ -666,62 +343,9 @@ Data Mining and Knowledge Discovery, 36(5), pp.1623-1646. kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery, pp.1-27. -#### 2. Implement channel selection algorithms - -Related issues: -[#1270](https://github.com/aeon-toolkit/aeon/issues/1270) -[#1467](https://github.com/aeon-toolkit/aeon/issues/1467) - -Channel selection in this context is the process of reducing the number of channels -in a collection of time series for classification, clustering or regression. This -project looks at filter based approaches to speed up multivariate time series -classification (MTSC) of high dimensional series. Standard approaches for -classifying high dimensional data are to -employ a filter to select a subset of attributes or to transform the data into a lower -dimensional feature space using, for example, principal component analysis. Our -focus is on dimensionality reduction through filtering. For MTSC, filtering is -generally accepted to be selecting the most important dimensions to use before -training the classifier. Dimension selection can, on average, either increase, not -change or decrease the accuracy of classification. The first case implies that the -higher dimensionality is confounding the classifier’s discriminatory power. In the -second case it is often still desirable to filter due to improved training time. In -the third case, filtering may still be desirable, depending on the trade-off between -performance (e.g. accuracy) and efficiency (e.g. train time): a small reduction in -accuracy may be acceptable if build time reduces by an order of magnitude. We -address the task of how best to select a subset of dimensions for high dimensional -data so that we can speed up and possibly improve HC2 on high dimensional -MTSC problems. -Detecting the best subset of dimensions is not a straightforward problem, -since the number of combinations to consider increases exponentially with the -number of dimensions. Selection is also made more complex by the fact that -the objective function used to assess a set of features may not generalise well -to unseen data. Furthermore, since the primary reason for filtering the dimensions -is improving the efficiency of the classifier, dimension selection strategies -themselves need to be fast. - -Currently we have the channel selection algorithms describe in [1,2] in aeon. It would -be great to include those in [3] and further work. This project will involve -experimental evaluation in addition to implementing -algorithms. We can co-ordinate the experiments with the candidate through our HPC -facilities. - -1. Implement a channel selection wrapper for the aeon toolkit (see [#1270](https://github.com/aeon-toolkit/aeon/issues/1270)) -2. Explore alternative ways of selecting channels after scoring (e.g. forward selection) -3. Use a fast classifier that can find train estimates through e.g. bagging and avoid the cross validation -4. Research, implement and evaluate alternative channel selection algorithms - -##### References -[1] Dhariyal, B. et al. Fast Channel Selection for Scalable Multivariate Time -Series Classification. AALTD, ECML-PKDD, Springer, 2021 -[2] Dhariyal, B. et al. Scalable Classifier-Agnostic Channel Selection - for Multivariate Time Series Classification", DAMI, 2023 -[3] Ruiz, A.P., Bagnall, A. Dimension Selection Strategies for Multivariate - Time Series Classification with HIVE-COTEv2.0. AALTD,ECML-PKDD 2022. - (https://doi.org/10.1007/978-3-031-24378-3_9) - ### Visualisation -#### 1. Explainable AI with the shapelet transform (Southampton intern project). +#### 1. Explainable AI with the shapelet transform. Mentors: TonyBagnall ({user}`TonyBagnall`) and David Guijo-Rubio ({user}`dguijo`) @@ -739,42 +363,18 @@ source toolkits, familiarisation with the shapelet code and the development of a visualisation tool to help relate shapelets back to the training data. An outline for the project is -Weeks 1-2: Familiarisation with open source, aeon and the visualisation module. Make +1. Familiarisation with open source, aeon and the visualisation module. Make contribution for a good first issue. -Weeks 3-4: Understand the shapelet transfer algorithm, engage in ongoing discussions +2. Understand the shapelet transfer algorithm, engage in ongoing discussions for possible improvements, run experiments to create predictive models for a test data set -Weeks 5-6: Design and prototype visualisation tools for shapelets, involving a range +3. Design and prototype visualisation tools for shapelets, involving a range of summary measures and visualisation techniques, including plotting shapelets on training data, calculating frequency, measuring similarity between -Weeks 7-8: Debug, document and make PRs to merge contributions into the aeon toolkit. +4. Debug, document and make PRs to merge contributions into the aeon toolkit. [1] Bagnall, A., Lines, J., Bostrom, A., Large, J. and Keogh, E. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, Volume 31, pages 606–660, (2017) [2] Ye, L., Keogh, E. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Disc 22, 149–182 (2011). https://doi.org/10.1007/s10618-010-0179-5 [3] Lines, L., Davis, L., Hills, J. and Bagnall, A. A shapelet transform for time series classification, KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (2012) https://doi.org/10.1145/2339530.2339579 -### Regression - -#### 1. Adapt forecasting regressors to time series extrinsic regression. - -Mentors: TonyBagnall ({user}`TonyBagnall`) and David Guijo-Rubio -({user}`dguijo`) - -Forecasting is often reduced to regression through the application of a sliding -window. This is a large research field that is distinct to time series extrinsic -regression, where each series is assumed to be independent. This is more of a -research project to investigate what techniques are used in forecasting for -regression based forecasting and to compare them to the time series specific -algorithms in aeon. This project would require further working up with the mentors. - - -#### 2. Adapt HIVE-COTE for regression - -Mentors: TonyBagnall ({user}`TonyBagnall`) and David Guijo-Rubio -({user}`dguijo`) - -HIVE-COTE [1] is a state of the art classifier. Adapting it for regression is an -ongoing research project for which we would welcome collaborators. Ongoing, this -needs working up. - ### Documentation @@ -807,9 +407,3 @@ to examples which use the function/class. 5. The main bulk of work is done, but the API documentation is vast and can always be improved! If time allows, continue to enhance the API documentation through individual docstrings, API landing page and template improvements at the mentees discretion. - -##### Project Outcomes - -Success of the project will be assessed by the quality of the documentation produced -and engagement with the project and the `aeon` community. Automatically generating -links to examples is the primary goal. diff --git a/docs/papers_using_aeon.md b/docs/papers_using_aeon.md index 30dba0fd44..243e0a4c49 100644 --- a/docs/papers_using_aeon.md +++ b/docs/papers_using_aeon.md @@ -4,84 +4,68 @@ This is a list of papers that use `aeon`. If you have a paper that uses `aeon`, please add it to this list by making a pull request. Please include a hyperlink to the paper and a link to the code in your personal GitHub or other repository. -## Challenge +If you want to reference `aeon` please reference this paper. -- Ermshaus, A., Schäfer, P., Bagnall, A., Guyet, T., Ifrim, G., Lemaire, V., ... & - Malinowski, S. (2023, September). Human Activity Segmentation Challenge@ ECML/PKDD’23. - In International Workshop on Advanced Analytics and Learning on Temporal Data - (pp. 3-13). Cham: Springer Nature Switzerland. - [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_1) [Webpage/Code](https://github.com/patrickzib/human_activity_segmentation_challenge) +Middlehurst, M., Ismail-Fawaz, A., Guillaume, A., Holder, C., Guijo-Rubio D., Bulatova, G., +Tsaprounis, L., Mentel, L., Walter, M., Schäfer, P. and Bagnall, A. +aeon: a Python Toolkit for Learning from Time Series. in Journal of Machine Learning Research; 25(289):1−10, 2024. +[Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_1) -## Classification +## 2024 -- Middlehurst, M. and Schäfer, P. and Bagnall, A. (2024). Bake off redux: a review +- Dempster, A., Tan, W. T., Miller, L., Foumani, N., Schmidt, D. and Webb, G (2024). + Highly Scalable Time Series Classification for Very Large Datasets, ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data. [Paper](https://ecml-aaltd.github.io/aaltd2024/articles/Dempster_AALTD24.pdf) +- Dempster, A., Schmidt, D. and Webb, G. (2024). QUANT: a minimalist interval method + for time series classification, Data Mining and Knowledge Discovery, Volume 38, + pages 2377–2402. [Paper](https://link.springer.com/article/10.1007/s10618-024-01036-9) +- Serramazza, D., Nguyen, T. and Ifrim, G. (2024) Improving the Evaluation and + Actionability of Explanation Methods for Multivariate Time Series Classification. + Proc. ECML/PKDD [ArXiV](https://arxiv.org/abs/2406.12507) +- Middlehurst, M., Schäfer, P. and Bagnall, A. (2024). Bake off redux: a review and experimental evaluation of recent time series classification algorithms. - Data Mining and Knowledge Discovery, online first, open access. + Data Mining and Knowledge Discovery, Volume 38, pages 1958–2031. [Paper](https://link.springer.com/article/10.1007/s10618-024-01022-1) [Webpage/Code](https://tsml-eval.readthedocs.io/en/stable/publications/2023/tsc_bakeoff/tsc_bakeoff_2023.html) -- Spinnato, F. and Guidotti, R. and Monreale, A. and Nanni, M. (2024). Fast, - Interpretable, and Deterministic Time Series Classification With a - Bag-of-Receptive-Fields. IEEE Access, vol. 12, (pp. 137893-137912). +- Spinnato, F. and Guidotti, R. and Monreale, A. and Nanni, M. (2024). Fast, Interpretable, + and Deterministic Time Series Classification With a Bag-of-Receptive-Fields. + IEEE Access, vol. 12, (pp. 137893-137912). [Paper](https://ieeexplore.ieee.org/document/10684604) [Code](https://github.com/fspinna/borf) -- Schäfer, P, and Leser, U. (2023). WEASEL 2.0: a random dilated dictionary transform - for fast, accurate and memory constrained time series classification. - Machine Learning, 112(12), pp.4763-4788. - [Paper](https://link.springer.com/content/pdf/10.1007/s10994-023-06395-w.pdf) [Webpage/Code](https://github.com/patrickzib/dictionary) - -## Clustering - - Holder, C., Middlehurst, M. and Bagnall, A., (2024). A review and evaluation of elastic distance functions for time series clustering. Knowledge and Information Systems, 66(2), pp.765-809. [Paper](https://link.springer.com/article/10.1007/s10115-023-01952-0) [Webpage/Code](https://tsml-eval.readthedocs.io/en/stable/publications/2023/distance_based_clustering/distance_based_clustering.html) -- Holder, C., Guijo-Rubio, D. and Bagnall, A., (2023), September. Clustering time series - with k-medoids based algorithms. In International Workshop on Advanced Analytics and - Learning on Temporal Data (pp. 39-55). - [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_4) -- Holder, Christopher & Bagnall, Anthony. (2024). - Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering. - 10.48550/arXiv.2411.17838. -[Paper](https://arxiv.org/abs/2411.17838) - -## Regression - +- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P. A., and Hervás-Martínez, C. (2024). O-Hydra: A Hybrid Convolutional and Dictionary-Based Approach to Time Series Ordinal Classification. In Conference of the Spanish Association for Artificial Intelligence (pp. 50-60). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-62799-6_6). - Guijo-Rubio, D., Middlehurst, M., Arcencio, G., Silva, D. and Bagnall, A. (2024). Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression. Data Mining and Knowledge Discovery, online first, open access. [Paper](https://arxiv.org/abs/2305.01429) [Webpage/Code](https://tsml-eval.readthedocs.io/en/stable/publications/2023/tser_archive_expansion/tser_archive_expansion.html) +- Ismail-Fawaz, A. and Devanne, M. and Berretti, S. and Weber, J. and Forestier, G. + (2024) May "Establishing a Unified Evaluation Framework for Human Motion + Generation: A Comparative Analysis of Metrics" [Paper](https://arxiv.org/abs/2405.07680) [code](https://github.com/MSD-IRIMAS/Evaluating-HMG) + +# 2023 +- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., Bagnall, A., and Hervás-Martínez, C. Convolutional and Deep Learning based techniques for Time Series Ordinal Classification.[ArXiV](https://arxiv.org/abs/2306.10084). +- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., and Hervás-Martínez, C. (2023). A Dictionary-Based Approach to Time Series Ordinal Classification. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2023. Lecture Notes in Computer Science, vol 14135. [Paper](https://link.springer.com/chapter/10.1007/978-3-031-43078-7_44). +- Ermshaus, A., Schäfer, P., Bagnall, A., Guyet, T., Ifrim, G., Lemaire, V., ... & + Malinowski, S. (2023, September). Human Activity Segmentation Challenge@ ECML/PKDD’23. + In International Workshop on Advanced Analytics and Learning on Temporal Data + (pp. 3-13). Cham: Springer Nature Switzerland. + [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_1) [Webpage/Code](https://github.com/patrickzib/human_activity_segmentation_challenge) +- Schäfer, P, and Leser, U. (2023). WEASEL 2.0: a random dilated dictionary transform + for fast, accurate and memory constrained time series classification. + Machine Learning, 112(12), pp.4763-4788. + [Paper](https://link.springer.com/content/pdf/10.1007/s10994-023-06395-w.pdf) [Webpage/Code](https://github.com/patrickzib/dictionary) - Middlehurst, M. and Bagnall, A., (2023), September. Extracting Features from Random Subseries: A Hybrid Pipeline for Time Series Classification and Extrinsic Regression. In International Workshop on Advanced Analytics and Learning on Temporal Data (pp. 113-126). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_8) [Webpage/Code](https://tsml-eval.readthedocs.io/en/stable/publications/2023/rist_pipeline/rist_pipeline.html) - -## Ordinal classification - -- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., Bagnall, A., and - Hervás-Martínez, C. Convolutional and Deep Learning based techniques for Time Series - Ordinal Classification. [ArXiV](https://arxiv.org/abs/2306.10084). -- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P. A., and Hervás-Martínez, C. - (2024). O-Hydra: A Hybrid Convolutional and Dictionary-Based Approach to Time Series - Ordinal Classification. In Conference of the Spanish Association for Artificial - Intelligence (pp. 50-60). [Paper](https://link.springer.com/chapter/10.1007/978-3-031-62799-6_6). -- Ayllón-Gavilán, R., Guijo-Rubio, D., Gutiérrez, P.A., and Hervás-Martínez, C. (2023). - A Dictionary-Based Approach to Time Series Ordinal Classification. In: Rojas, I., - Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2023. - Lecture Notes in Computer Science, vol 14135. [Paper](https://link.springer.com/chapter/10.1007/978-3-031-43078-7_44). - -## Prototyping - - Ismail-Fawaz, A. and Ismail Fawaz, H. and Petitjean, F. and Devanne, M. and Weber, - J. and Berretti, S. and Webb, GI. and Forestier, G. (2023 December "ShapeDBA: - Generating Effective Time Series Prototypes Using ShapeDTW Barycenter Averaging." - ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data. [Paper](https://doi.org/10.1007/978-3-031-49896-1_9) - [code](https://github.com/MSD-IRIMAS/ShapeDBA) -- Holder, C., Guijo-Rubio, D., & Bagnall, A. J. (2023). Barycentre Averaging for the - Move-Split-Merge Time Series Distance Measure. In Proceedings of the 15th - International Joint Conference on Knowledge Discovery, Knowledge Engineering and - Knowledge Management-Volume 1:, 51-62, pp. 51-62. [Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0012164900003598) - -## Generation Evaluation - -- Ismail-Fawaz, A. and Devanne, M. and Berretti, S. and Weber, J. and Forestier, G. - (2024) May "Establishing a Unified Evaluation Framework for Human Motion - Generation: A Comparative Analysis of Metrics" [Paper](https://arxiv.org/abs/2405.07680) [code](https://github.com/MSD-IRIMAS/Evaluating-HMG) + J. and Berretti, S. and Webb, GI. and Forestier, G. (2023) ShapeDBA: Generating + Effective Time Series Prototypes Using ShapeDTW Barycenter Averaging. ECML/PKDD + Workshop on Advanced Analytics and Learning on Temporal Data. [Paper](https://doi.org/10.1007/978-3-031-49896-1_9) [code](https://github.com/MSD-IRIMAS/ShapeDBA) +- Holder, C., Guijo-Rubio, D., & Bagnall, A. J. (2023). Barycentre Averaging for the Move-Split-Merge Time Series Distance Measure. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management-Volume 1:, 51-62, pp. 51-62. [Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0012164900003598) +[Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0012164900003598) +- Holder, C., Guijo-Rubio, D. and Bagnall, A., (2023), September. Clustering time series + with k-medoids based algorithms. In International Workshop on Advanced Analytics and + Learning on Temporal Data (pp. 39-55). + [Paper](https://link.springer.com/chapter/10.1007/978-3-031-49896-1_4)