High-Dimensional Data Stream Classification: Improving Random Patch Online Ensemble Classifier
Keywords:
Data Stream Mining, Online machine learning, Incremental classification, High-dimensional stream, Streaming Random Patches, Compressed SensingAbstract
In recent years, the amount of data produced by human activities has increased massively, giving rise to a constant flow of data generated in real time, known as data streams. Data stream classification requires both incremental and adaptive learning approaches, mainly due to the challenges inherent in the data stream's rapidly changing patterns. The Streaming Random Patches (SRP), investigated in this work, offers a robust online ensemble model for evolving data stream classification. The latter uses incremental decision trees, Hoeffding Trees (HT), as base learners for online forecasts. Each tree is incrementally trained on a unique random patch formed via global feature subspacing and online bagging to ensure ensemble variety. The OB brings bagging to streaming. It suggests instance weights for training frequency instead of sampling with replacement. A drift detection strategy replaces outdated base learners in each tree to keep ensemble relevance to recent data and prevent outdated predictions. The ensemble is incrementally built by testing the HTs before training, updating base learner weights based on testing predictions. Weighted majority voting determines ensemble predictions. Therefore, this study aims to retain good SRP performance when identifying high-dimensional streams. The unpredictable nature of data stream instances and their controllable dimensions can degrade the online classifier's prediction quality, availability, and execution time. To increase SRP classifier performance, we refine the compressed sensing (CS) technique before incremental stream processing to ensure efficient subspace selection. Instead of the HT as the base classifier, we employ the Extremely Fast Decision Tree (EFDT) as a more statistically efficient base learner in the final model. The SRP and other techniques improved high-dimensional data stream prediction performance. Average accuracy gains were +0.15% to +5.43%. The suggested modifications reduced execution time by 95.69%, indicating the method's Green AI alignment.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









