Overview
Machine learning for data streams (MLDS) has been a significant research area since the late 90s, with increasing adoption in industry over the past few years. Despite commendable efforts in open-source libraries, a gap persists between pioneering research and accessible tools, presenting challenges for practitioners, including experienced data scientists, in implementing and evaluating methods in this complex domain. Our focus is to familiarize attendees with the application of diverse machine learning tasks to streaming data. Beyond an introductory overview, where we delineate the learning cycle of typical supervised learning tasks, we steer our focus towards intricate and pertinent challenges seldom addressed in conventional tutorials, such as:
- Prediction Intervals for regression tasks
- Concept drift detection, visualization and evaluation
- Modelling and addressing partially and delayed labeled data streams using semi-supervised and active learning
- The idiosyncrasies of applying and evaluating clustering on a data stream
- The limitations and opportunities w.r.t. AutoML for streams
Our goal is to discuss the theoretical concepts behind these topics, and present practical examples of how one can implement and evaluate them using Python. We will also demonstrate examples using the MOA graphical user interface; however, due to time constraints, there won’t be any Java coding examples.
Organizers
Heitor Murilo Gomes is a senior lecturer at the Victoria University of Wellington (VuW) in New Zealand. Before joining VuW, Heitor was a senior research fellow and co-director of the AI Institute at the University of Waikato were he taught from 2020 to 2022 the “data stream mining” (COMPX523) course. Heitor’s main research area is the application of machine learning for data streams in a variety of tasks. (website)
Maroua Bahri is a SRP (Starting Research Position) researcher at Inria Paris within the MiMove project-team. She obtained a PhD degree in Computer Science from Télécom Paris - Institut Polytechnique de Paris. Her research focuses on machine learning and particularly lies at the intersection of data stream mining algorithms and summarization techniques. More specifically, she is currently engaged in different research topics, including data stream mining, automated machine learning. (website)
Marco Heyden is a research scientist and PhD student in the field of machine learning and data mining at Karlsruhe Institute of Technology. He focus on learning from sequential data, specifically the intersection between data stream mining and decision making under uncertainty. More specifically, his re- search interests include unsupervised concept drift detection and adaptation, incremental decision trees, and multi-armed bandits. (website)
Materials
Slides
Please find the slides of our tutorial here:
Notebooks
And here are the notebooks for you to download: