OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments

Hymalai Bello, Lala Ray, Joanna Sorysz, Sungho Suh, Paul Lukowicz
DFKI Kaiserslautern · RPTU Kaiserslautern-Landau · Korea University

Teaser

OpenMarcie introduces a large-scale multimodal action dataset for manufacturing, combining egocentric and exocentric vision with wearable sensing and audio.
OpenMarcie teaser figure.

Project Materials

Resources for paper reading, dataset access, and official implementation.

Summary

OpenMarcie is a multimodal dataset for human action monitoring in industrial environments. It targets key limitations of existing datasets by providing synchronized multimodal signals, more realistic open-ended workflows, and diverse participants. The dataset captures both wearable and camera-based observations under two experimental settings: bicycle assembly/disassembly and 3D-printer assembly with guided procedural knowledge. OpenMarcie is benchmarked on three tasks: activity classification, open-vocabulary captioning, and cross-modal alignment.

37+ hrs
Recorded Data
36
Participants
8
Data Types
200+
Channels

Dataset Overview

Setting A: Bicycle Task

12 participants perform assembly/disassembly under semi-realistic, non-protocol conditions.

Setting B: 3D-Printer Task

24 valid participants follow manufacturer instructions and collaborative correction steps.

Multimodal Sources

Wearables, egocentric/exocentric vision, and audio streams are synchronized end-to-end.

Industrial Relevance

Captures long-horizon, procedural, real-world workflows beyond short isolated actions.

Sensor Placement & Scenarios

Scenario Layouts (Ad-hoc + Procedural)

Experiment room setting and exocentric camera views for both scenarios.

Wearable Sensor Placement

Participant wearable setup and multimodal sensor signals in Scenario (a) and (b).

Two Scenarios with Labels

Egocentric/exocentric activity examples with soft and hard annotations for both scenarios.

Dataset Statistics

Participant Statistics

Height, age, experience level, and academic/professional distribution.

Ownership & Usage Habits

Participant ownership and usage patterns for bicycles vs 3D printers.

Activity Distribution

OpenMarcie action frequency for ad-hoc and procedural scenarios.

Benchmarks & Tasks

Done (validated in this paper):

Human Activity Recognition (HAR)

Recognize worker actions from multimodal streams in manufacturing scenes.

Open-Vocabulary Captioning

Generate free-form textual descriptions of ongoing industrial activities.

Cross-Modal Alignment

Learn consistent representations across wearable and visual modalities.

TODO (from Supplementary roadmap):

Procedural Planning & Task Decomposition

Model long-horizon workflows, task graphs, dependencies, and sequence decomposition.

Skill Assessment & Expertise Modeling

Estimate proficiency and learning progression; support adaptive training interventions.

Intent Prediction & Early Forecasting

Predict upcoming goals/actions from partial multimodal context for proactive assistance.

Fine-Grained Segmentation & Roles

Handle overlapping multi-label actions, temporal boundaries, and role dynamics.

Pose & Body-Language Reasoning

Advance pose estimation, pose forecasting, and non-verbal intent understanding.

Cross-Modal Knowledge Transfer

Use one modality to supervise another via distillation, dropout robustness, and unsupervised alignment.

Cross-Modal Generation & Simulation

Generate missing modalities and improve simulation-to-reality transfer with synthetic augmentation.

Citation

@inproceedings{bello2026openmarcie,
	title     = {OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments},
	author    = {Bello, Hymalai and Ray, Lala and Sorysz, Joanna and Suh, Sungho and Lukowicz, Paul},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year      = {2026}
}