Human Activity Recognition Challenge
Competing in Sussex-Huawei Locomotion Challenge 2018—classifying walking, running, biking, and transit modes from smartphone IMU data using ensemble decision trees and wavelet features.
ShareThe Challenge
Your smartphone knows when you're walking, running, or riding a bike—but how? The Sussex-Huawei Locomotion Recognition Challenge 2018 posed this question to researchers worldwide: given raw accelerometer and gyroscope data from users going about daily activities, classify eight distinct locomotion modes with maximum accuracy while minimizing computational overhead.
The twist: solutions must be deployable on resource-constrained mobile devices. No massive deep neural networks, no cloud processing—just efficient algorithms running on smartphone hardware. Our team, "Maximum Analytics," tackled this constraint head-on with classical machine learning, thoughtful feature engineering, and ensemble methods.
Dataset & Problem Statement
The Sussex-Huawei Locomotion (SHL) dataset contains smartphone IMU (Inertial Measurement Unit) readings from users performing eight activities:
- Walking, Running, Biking: Self-explanatory locomotion modes
- Car, Bus, Train, Subway: Motorized transportation modes
- Still: Stationary periods
Each sample: 3-axis accelerometer (m/s²) + 3-axis gyroscope (rad/s), sampled at 100Hz. Training set: 5 days × 8 hours × 8 activities = thousands of labeled windows. Test set: unseen users and environments.
Evaluation Metric: Classification accuracy on held-out test set. Secondary objective: minimize model size and inference latency for mobile deployment.
Feature Engineering: Extracting Patterns from Motion
Raw sensor streams contain hidden structure—walking produces periodic acceleration spikes at step frequency; biking shows smoother, higher-magnitude patterns; vehicles introduce low-frequency vibrations. Our feature extraction pipeline transforms temporal signals into discriminative representations:
1. Statistical Features (20 features):
- Mean, standard deviation: capture central tendency and variability
- Skewness, kurtosis: quantify distribution shape (symmetry, tail heaviness)
- Range (max - min): overall signal amplitude
Applied independently to each axis (3 × accelerometer + 3 × gyroscope = 6 signals), yielding 6 × 5 stats = 30 features per window.
2. Wavelet Decomposition (Daubechies 2):
Discrete wavelet transform decomposes signals into time-frequency components—simultaneously capturing transient events (steps) and long-term trends (sustained motion). Daubechies 2 wavelets provide compact support (localized in time) while maintaining smoothness.
Decomposition yields approximation coefficients (low-frequency trends) and detail coefficients (high-frequency transients) across multiple scales. We extract statistical features from each subband, capturing multi-scale motion patterns.
3. Zero Crossing Rate (Gyroscope):
Counts how often gyroscope signals cross zero—proxy for rotational oscillation frequency. Walking exhibits high zero-crossing rate (alternating leg swings); vehicles show low rate (steady orientation).
4. Power Spectral Density:
FFT (Fast Fourier Transform) converts time-domain signals to frequency domain, revealing dominant oscillation frequencies. Walking: 1-2 Hz (step rate), biking: 1.5-3 Hz (pedal cadence), vehicles: broadband noise below 5 Hz.
Model Architecture: Ensemble Bagged Decision Trees
Given resource constraints, we eschewed deep learning in favor of ensemble decision trees—specifically, bootstrap aggregating (bagging) with CART (Classification and Regression Trees).
Why Decision Trees?
- Interpretability: Each split represents an intuitive decision rule (e.g., "if mean acceleration > 2 m/s², classify as running")
- Speed: Inference requires ~log(N) comparisons, achieving millisecond latency on mobile CPUs
- Non-linearity: Trees naturally model complex decision boundaries without manual feature interactions
- No Scaling Required: Tree splits are threshold-based, invariant to feature scaling
Bagging (Bootstrap Aggregating):
Train multiple trees on random subsets (with replacement) of training data. Each tree sees slightly different data distribution, learning complementary patterns. At inference, aggregate predictions via majority vote—reducing variance and improving robustness.
We trained 50 trees, each with maximum depth 20. Bagging mitigates overfitting inherent in deep trees while maintaining individual tree expressiveness.
Handling Class Imbalance
Real-world activity distributions are skewed—users spend more time walking than running, more time in cars than subways. This imbalance biases classifiers toward majority classes.
Our Approach: Stratified Subsampling
Rather than complex resampling (SMOTE, class weights), we strategically subset training data to balance class representation. For minority classes (running, subway), include all available samples. For majority classes (walking, bus), randomly sample to match minority class size.
This "poor man's balancing" sacrificed some majority class examples but ensured the model learned discriminative features for rare activities—critical for real-world deployment where missing a subway ride is worse than occasionally misclassifying walking.
Results & Performance
Best Model Accuracy: 82.8% (on held-out test set)
Alternative configurations achieved 81-83% range, indicating stable performance across hyperparameter variations. Compared to competition benchmarks:
- Top-performing deep learning models: 85-88% (but 100-500MB model size, GPU-dependent)
- Our ensemble: 82.8% with <10MB model, CPU-friendly inference
Per-Class Breakdown:
- High accuracy (>90%): Still, biking, running (distinctive motion signatures)
- Moderate accuracy (75-85%): Walking, car (overlap with similar modes)
- Challenging (<70%): Bus vs. train, subway vs. train (similar vibration profiles)
Inference Performance:
- Classification latency: ~5ms per window (MATLAB on i7 CPU)
- Feature extraction: ~15ms per window
- Total pipeline: 20ms → capable of real-time processing at 50Hz
Key Insights & Challenges
Wavelet Features Dominate: Ablation studies showed wavelet-derived features contributed 60% of classification accuracy—confirming multi-scale time-frequency analysis captures richer patterns than raw statistics.
Gyroscope vs. Accelerometer: Gyroscope zero-crossing rate proved surprisingly effective for rotational activities (biking, vehicles turning). Accelerometer magnitude alone struggled with vehicle modes.
Train vs. Subway Confusion: Both involve low-frequency vibrations on rails—difficult to distinguish without auxiliary sensors (GPS, barometer). Future work: incorporate multi-modal sensing.
Temporal Context: Single-window classification ignores sequential patterns (e.g., walking → bus indicates bus boarding). Incorporating Markov models or RNNs could leverage transition probabilities.
Technical Resources
Full implementation, pre-trained models, and processed datasets available on GitHub:
Requirements: MATLAB R2016b or later, Signal Processing Toolbox, Statistics and Machine Learning Toolbox
Repository Contents:
- Feature extraction scripts (wavelet, FFT, statistical)
- Pre-trained ensemble models (.mat files)
- Processed gyroscope data and labels
- Evaluation scripts for reproducing results