Human Activity Recognition Challenge 2018

The Challenge

Your smartphone knows when you're walking, running, or riding a bike—but how? The Sussex-Huawei Locomotion Recognition Challenge 2018 posed this question to researchers worldwide: given raw accelerometer and gyroscope data from users going about daily activities, classify eight distinct locomotion modes with maximum accuracy while minimizing computational overhead.

The twist: solutions must be deployable on resource-constrained mobile devices. No massive deep neural networks, no cloud processing—just efficient algorithms running on smartphone hardware. Our team, "Maximum Analytics," tackled this constraint head-on with classical machine learning, thoughtful feature engineering, and ensemble methods.

Dataset & Problem Statement

The Sussex-Huawei Locomotion (SHL) dataset contains smartphone IMU (Inertial Measurement Unit) readings from users performing eight activities:

Walking, Running, Biking: Self-explanatory locomotion modes
Car, Bus, Train, Subway: Motorized transportation modes
Still: Stationary periods

Each sample: 3-axis accelerometer (m/s²) + 3-axis gyroscope (rad/s), sampled at 100Hz. Training set: 5 days × 8 hours × 8 activities = thousands of labeled windows. Test set: unseen users and environments.

Evaluation Metric: Classification accuracy on held-out test set. Secondary objective: minimize model size and inference latency for mobile deployment.

Feature Engineering: Extracting Patterns from Motion

Raw sensor streams contain hidden structure—walking produces periodic acceleration spikes at step frequency; biking shows smoother, higher-magnitude patterns; vehicles introduce low-frequency vibrations. Our feature extraction pipeline transforms temporal signals into discriminative representations:

1. Statistical Features (20 features):

Mean, standard deviation: capture central tendency and variability
Skewness, kurtosis: quantify distribution shape (symmetry, tail heaviness)
Range (max - min): overall signal amplitude

Applied independently to each axis (3 × accelerometer + 3 × gyroscope = 6 signals), yielding 6 × 5 stats = 30 features per window.

2. Wavelet Decomposition (Daubechies 2):

Discrete wavelet transform decomposes signals into time-frequency components—simultaneously capturing transient events (steps) and long-term trends (sustained motion). Daubechies 2 wavelets provide compact support (localized in time) while maintaining smoothness.

Decomposition yields approximation coefficients (low-frequency trends) and detail coefficients (high-frequency transients) across multiple scales. We extract statistical features from each subband, capturing multi-scale motion patterns.

3. Zero Crossing Rate (Gyroscope):

Counts how often gyroscope signals cross zero—proxy for rotational oscillation frequency. Walking exhibits high zero-crossing rate (alternating leg swings); vehicles show low rate (steady orientation).

4. Power Spectral Density:

FFT (Fast Fourier Transform) converts time-domain signals to frequency domain, revealing dominant oscillation frequencies. Walking: 1-2 Hz (step rate), biking: 1.5-3 Hz (pedal cadence), vehicles: broadband noise below 5 Hz.

Model Architecture: Ensemble Bagged Decision Trees

Given resource constraints, we eschewed deep learning in favor of ensemble decision trees—specifically, bootstrap aggregating (bagging) with CART (Classification and Regression Trees).

Why Decision Trees?

Interpretability: Each split represents an intuitive decision rule (e.g., "if mean acceleration > 2 m/s², classify as running")
Speed: Inference requires ~log(N) comparisons, achieving millisecond latency on mobile CPUs
Non-linearity: Trees naturally model complex decision boundaries without manual feature interactions
No Scaling Required: Tree splits are threshold-based, invariant to feature scaling

Bagging (Bootstrap Aggregating):

Train multiple trees on random subsets (with replacement) of training data. Each tree sees slightly different data distribution, learning complementary patterns. At inference, aggregate predictions via majority vote—reducing variance and improving robustness.

We trained 50 trees, each with maximum depth 20. Bagging mitigates overfitting inherent in deep trees while maintaining individual tree expressiveness.

Handling Class Imbalance

Real-world activity distributions are skewed—users spend more time walking than running, more time in cars than subways. This imbalance biases classifiers toward majority classes.

Our Approach: Stratified Subsampling

Rather than complex resampling (SMOTE, class weights), we strategically subset training data to balance class representation. For minority classes (running, subway), include all available samples. For majority classes (walking, bus), randomly sample to match minority class size.

This "poor man's balancing" sacrificed some majority class examples but ensured the model learned discriminative features for rare activities—critical for real-world deployment where missing a subway ride is worse than occasionally misclassifying walking.

Results & Performance

Best Model Accuracy: 82.8% (on held-out test set)

Alternative configurations achieved 81-83% range, indicating stable performance across hyperparameter variations. Compared to competition benchmarks:

Top-performing deep learning models: 85-88% (but 100-500MB model size, GPU-dependent)
Our ensemble: 82.8% with <10MB model, CPU-friendly inference

Per-Class Breakdown:

High accuracy (>90%): Still, biking, running (distinctive motion signatures)
Moderate accuracy (75-85%): Walking, car (overlap with similar modes)
Challenging (<70%): Bus vs. train, subway vs. train (similar vibration profiles)

Inference Performance:

Classification latency: ~5ms per window (MATLAB on i7 CPU)
Feature extraction: ~15ms per window
Total pipeline: 20ms → capable of real-time processing at 50Hz

Key Insights & Challenges

Wavelet Features Dominate: Ablation studies showed wavelet-derived features contributed 60% of classification accuracy—confirming multi-scale time-frequency analysis captures richer patterns than raw statistics.

Gyroscope vs. Accelerometer: Gyroscope zero-crossing rate proved surprisingly effective for rotational activities (biking, vehicles turning). Accelerometer magnitude alone struggled with vehicle modes.

Train vs. Subway Confusion: Both involve low-frequency vibrations on rails—difficult to distinguish without auxiliary sensors (GPS, barometer). Future work: incorporate multi-modal sensing.

Temporal Context: Single-window classification ignores sequential patterns (e.g., walking → bus indicates bus boarding). Incorporating Markov models or RNNs could leverage transition probabilities.

Technical Resources

Full implementation, pre-trained models, and processed datasets available on GitHub:

→ View on GitHub

Requirements: MATLAB R2016b or later, Signal Processing Toolbox, Statistics and Machine Learning Toolbox

Repository Contents:

Feature extraction scripts (wavelet, FFT, statistical)
Pre-trained ensemble models (.mat files)
Processed gyroscope data and labels
Evaluation scripts for reproducing results