Skip to content

Provenance Tracking

MedTWIN tracks the complete lineage of every computation.

What is Provenance?

Provenance answers: "Where did this result come from?"

For every statistic, we track:

  • Input data: Which dataset, which version
  • Configuration: Analysis parameters
  • Code: Exact computation executed
  • Output: The result and metadata

Provenance Chain

Data Upload (v1.0)
Data Mapping (config_abc)
Analysis Config (spec_xyz)
Run Execution (RUN-00234)
Statistic in Paper (3.2%)

Versioning

Data Versioning

Every data upload creates a version:

  • v1.0: Original upload
  • v1.1: Added 50 patients
  • v1.2: Corrected 3 records

Config Versioning

Analysis configurations are versioned:

  • Each change creates new version
  • Previous configs preserved
  • Re-run with any version

Run Immutability

Once executed, runs are immutable:

  • Results never change
  • Audit trail preserved
  • Reproducible forever

Reproducibility

Re-running Analysis

To reproduce any result:

  1. Select the run
  2. Click "Re-run"
  3. Same inputs → Same outputs

Verification

MedTWIN verifies reproducibility:

  • Bit-for-bit identical results
  • Warns if environment changed
  • Documents any differences

Audit Export

Export complete provenance:

audit_RUN-00234/
├── data_snapshot.csv
├── config.json
├── code.py
├── output.json
├── log.txt
└── manifest.md

Use Cases

Peer Review

Reviewers can verify any statistic by following its provenance chain to source data.

Regulatory Compliance

FDA and other regulators require complete audit trails. MedTWIN provides this automatically.

Team Handoffs

New team members can understand exactly how results were produced.