Provenance Tracking

MedTWIN tracks the complete lineage of every computation.

What is Provenance?

Provenance answers: "Where did this result come from?"

For every statistic, we track:

Input data: Which dataset, which version
Configuration: Analysis parameters
Code: Exact computation executed
Output: The result and metadata

Provenance Chain

Data Upload (v1.0)
    │
    ▼
Data Mapping (config_abc)
    │
    ▼
Analysis Config (spec_xyz)
    │
    ▼
Run Execution (RUN-00234)
    │
    ▼
Statistic in Paper (3.2%)

Versioning

Data Versioning

Every data upload creates a version:

v1.0: Original upload
v1.1: Added 50 patients
v1.2: Corrected 3 records

Config Versioning

Analysis configurations are versioned:

Each change creates new version
Previous configs preserved
Re-run with any version

Run Immutability

Once executed, runs are immutable:

Results never change
Audit trail preserved
Reproducible forever

Reproducibility

Re-running Analysis

To reproduce any result:

Select the run
Click "Re-run"
Same inputs → Same outputs

Verification

MedTWIN verifies reproducibility:

Bit-for-bit identical results
Warns if environment changed
Documents any differences

Audit Export

Export complete provenance:

audit_RUN-00234/
├── data_snapshot.csv
├── config.json
├── code.py
├── output.json
├── log.txt
└── manifest.md

Use Cases

Peer Review

Reviewers can verify any statistic by following its provenance chain to source data.

Regulatory Compliance

FDA and other regulators require complete audit trails. MedTWIN provides this automatically.

Team Handoffs

New team members can understand exactly how results were produced.