Project Materials
Resources for paper reading, dataset access, and official implementation.
Dataset (Single + Multi)
Single link:
projects.dfki.uni-kl.de/open-marcie
User: reviewer
Password: 1234
Multi links:
Kaggle Part 1
Kaggle Part 2
Kaggle Part 3
Kaggle Part 4
Code
Summary
OpenMarcie is a multimodal dataset for human action monitoring in industrial environments. It targets key limitations of existing datasets by providing synchronized multimodal signals, more realistic open-ended workflows, and diverse participants. The dataset captures both wearable and camera-based observations under two experimental settings: bicycle assembly/disassembly and 3D-printer assembly with guided procedural knowledge. OpenMarcie is benchmarked on three tasks: activity classification, open-vocabulary captioning, and cross-modal alignment.
Dataset Overview
Setting A: Bicycle Task
12 participants perform assembly/disassembly under semi-realistic, non-protocol conditions.
Setting B: 3D-Printer Task
24 valid participants follow manufacturer instructions and collaborative correction steps.
Multimodal Sources
Wearables, egocentric/exocentric vision, and audio streams are synchronized end-to-end.
Industrial Relevance
Captures long-horizon, procedural, real-world workflows beyond short isolated actions.
Sensor Placement & Scenarios
Scenario Layouts (Ad-hoc + Procedural)
Wearable Sensor Placement
Two Scenarios with Labels
Dataset Statistics
Benchmarks & Tasks
Done (validated in this paper):
Human Activity Recognition (HAR)
Recognize worker actions from multimodal streams in manufacturing scenes.
Open-Vocabulary Captioning
Generate free-form textual descriptions of ongoing industrial activities.
Cross-Modal Alignment
Learn consistent representations across wearable and visual modalities.
TODO (from Supplementary roadmap):
Procedural Planning & Task Decomposition
Model long-horizon workflows, task graphs, dependencies, and sequence decomposition.
Skill Assessment & Expertise Modeling
Estimate proficiency and learning progression; support adaptive training interventions.
Intent Prediction & Early Forecasting
Predict upcoming goals/actions from partial multimodal context for proactive assistance.
Fine-Grained Segmentation & Roles
Handle overlapping multi-label actions, temporal boundaries, and role dynamics.
Pose & Body-Language Reasoning
Advance pose estimation, pose forecasting, and non-verbal intent understanding.
Cross-Modal Knowledge Transfer
Use one modality to supervise another via distillation, dropout robustness, and unsupervised alignment.
Cross-Modal Generation & Simulation
Generate missing modalities and improve simulation-to-reality transfer with synthetic augmentation.
Citation
@inproceedings{bello2026openmarcie,
title = {OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments},
author = {Bello, Hymalai and Ray, Lala and Sorysz, Joanna and Suh, Sungho and Lukowicz, Paul},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}