Provenance Tracking
MedTWIN tracks the complete lineage of every computation.
What is Provenance?
Provenance answers: "Where did this result come from?"
For every statistic, we track:
- Input data: Which dataset, which version
- Configuration: Analysis parameters
- Code: Exact computation executed
- Output: The result and metadata
Provenance Chain
Data Upload (v1.0)
│
▼
Data Mapping (config_abc)
│
▼
Analysis Config (spec_xyz)
│
▼
Run Execution (RUN-00234)
│
▼
Statistic in Paper (3.2%)
Versioning
Data Versioning
Every data upload creates a version:
- v1.0: Original upload
- v1.1: Added 50 patients
- v1.2: Corrected 3 records
Config Versioning
Analysis configurations are versioned:
- Each change creates new version
- Previous configs preserved
- Re-run with any version
Run Immutability
Once executed, runs are immutable:
- Results never change
- Audit trail preserved
- Reproducible forever
Reproducibility
Re-running Analysis
To reproduce any result:
- Select the run
- Click "Re-run"
- Same inputs → Same outputs
Verification
MedTWIN verifies reproducibility:
- Bit-for-bit identical results
- Warns if environment changed
- Documents any differences
Audit Export
Export complete provenance:
audit_RUN-00234/
├── data_snapshot.csv
├── config.json
├── code.py
├── output.json
├── log.txt
└── manifest.md
Use Cases
Peer Review
Reviewers can verify any statistic by following its provenance chain to source data.
Regulatory Compliance
FDA and other regulators require complete audit trails. MedTWIN provides this automatically.
Team Handoffs
New team members can understand exactly how results were produced.