Running Statistical Analysis
Configure and run reproducible statistical analyses with full traceability.
Analysis Types
MedTWIN supports several analysis types, all deterministic—same input always produces same output.
Logistic Regression
Use when: Binary outcome (yes/no, 0/1)
Output: - Odds ratios with 95% CI - P-values for each predictor - Model fit statistics (AUC, Hosmer-Lemeshow) - Classification metrics (sensitivity, specificity)
Example: Predicting 30-day mortality based on age, comorbidities, and lab values.
Cox Proportional Hazards
Use when: Time-to-event with censoring
Output: - Hazard ratios with 95% CI - Kaplan-Meier curves - Log-rank test - Concordance index
Example: Survival analysis comparing treatment groups.
Linear Regression
Use when: Continuous outcome
Output: - Coefficients with 95% CI - R² and adjusted R² - Residual diagnostics - Predicted vs actual plots
Example: Predicting length of stay based on patient characteristics.
Descriptive Statistics
Use when: Summarizing baseline characteristics
Output: - Table 1 (demographics by group) - Mean/SD for continuous variables - N (%) for categorical variables - Standardized differences
Configuration
Step 1: Select Analysis Type
┌─────────────────────────────────────────┐
│ Select Analysis Type │
├─────────────────────────────────────────┤
│ ○ Logistic Regression │
│ ● Cox Proportional Hazards ← selected │
│ ○ Linear Regression │
│ ○ Descriptive Statistics │
└─────────────────────────────────────────┘
Step 2: Define Variables
For Cox Regression:
| Field | Description | Example |
|---|---|---|
| Time Variable | Time to event or censoring | followup_days |
| Event Indicator | 1 = event occurred, 0 = censored | mortality |
| Primary Predictor | Main exposure of interest | treatment_group |
| Covariates | Confounders to adjust for | age, sex, comorbidity_score |
Step 3: Configure Options
Analysis Configuration:
# Variable selection
feature_selection: true
selection_method: "backward" # forward, backward, stepwise
selection_criterion: "AIC" # AIC, BIC, p-value
# Validation
cross_validation: true
cv_folds: 5
# Missing data
missing_strategy: "complete_case" # complete_case, imputation
# Output
confidence_level: 0.95
decimal_places: 3
Running the Analysis
Start Analysis
Click Run Analysis. You'll see:
┌─────────────────────────────────────────┐
│ Analysis Running... │
│ │
│ ████████████░░░░░░░░ 60% │
│ │
│ ✓ Data validation │
│ ✓ Missing data handling │
│ ◉ Fitting model... │
│ ○ Generating results │
│ ○ Creating visualizations │
└─────────────────────────────────────────┘
Typical Runtime
| Data Size | Expected Time |
|---|---|
| < 1,000 rows | 5-15 seconds |
| 1,000 - 10,000 rows | 15-45 seconds |
| 10,000 - 100,000 rows | 1-3 minutes |
| > 100,000 rows | 3-10 minutes |
Results
Results Table
┌────────────────────────────────────────────────────────────┐
│ Cox Regression Results │
│ Outcome: mortality | N = 1,247 | Events = 312 │
├──────────────────┬─────────┬───────────────┬──────────────┤
│ Variable │ HR │ 95% CI │ P-value │
├──────────────────┼─────────┼───────────────┼──────────────┤
│ age (per year) │ 1.04 │ 1.02 - 1.06 │ <0.001 ** │
│ sex (male) │ 1.23 │ 0.98 - 1.54 │ 0.072 │
│ hba1c (per %) │ 1.42 │ 1.18 - 1.71 │ <0.001 ** │
│ egfr (per 10) │ 0.91 │ 0.85 - 0.97 │ 0.004 * │
│ treatment │ 0.67 │ 0.52 - 0.86 │ 0.002 * │
└──────────────────┴─────────┴───────────────┴──────────────┘
Model Diagnostics
- Concordance Index: 0.72 (95% CI: 0.68 - 0.76)
- Proportional Hazards Test: Global p = 0.34 (assumption met)
- Sample Size: 1,247 (312 events, 935 censored)
Visualizations
Generated automatically:
- Kaplan-Meier Curves - Survival by group
- Forest Plot - Hazard ratios with CI
- Schoenfeld Residuals - PH assumption check
- Predicted Risk Distribution - Model calibration
Traceability
Every result links back to its source.
Click Any Statistic
HR = 1.42 (95% CI: 1.18 - 1.71)
│
├── Variable: hba1c_baseline
├── Analysis: Cox Regression
├── Run ID: run_20240115_143022
├── N: 1,247 (312 events)
│
├── Data Source:
│ └── master_dataset v2
│ └── Mapping: mapping_v2_20240110
│ └── Original file: diabetes_cohort.xlsx
│
├── Computation:
│ └── Python: lifelines.CoxPHFitter
│ └── [View full code]
│
└── Audit:
└── Created: 2024-01-15 14:30:22
└── User: dr.smith@hospital.org
└── [View audit log]
Chain of Custody
Original Data → Mapping → Master Dataset → Analysis → Result
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
SHA-256 Version 2 Snapshot Run #3 Locked
verified committed frozen complete immutable
Re-running Analysis
When Data Changes
If you update your mapping:
- Previous analysis results are preserved
- Click Re-run with Current Data
- New results are saved as a new run
- Compare results side-by-side
Sensitivity Analysis
Run variations to test robustness:
| Run | Configuration | Key Result |
|---|---|---|
| Run 1 | Complete case | HR = 1.42 |
| Run 2 | Multiple imputation | HR = 1.38 |
| Run 3 | Excluding outliers | HR = 1.45 |
Exporting Results
Export Options
| Format | Contents |
|---|---|
| Tables (Word/Excel) | Results tables, formatted |
| Figures (PNG/SVG) | High-res visualizations |
| Code (Python/R) | Reproducible analysis script |
| Full Bundle | Data + code + results + audit log |
For Journal Submission
Click Export for Publication:
- Tables formatted per journal guidelines
- Figures at 300 DPI
- Methods text describing analysis
- Supplementary materials
Best Practices
Before Running
- ✅ Verify variable mappings are correct
- ✅ Check for sufficient sample size
- ✅ Review missing data patterns
- ✅ Pre-specify your analysis plan
Interpreting Results
- ✅ Check model diagnostics before interpreting coefficients
- ✅ Consider clinical significance, not just statistical
- ✅ Review outliers and influential observations
- ✅ Run sensitivity analyses
For Reproducibility
- ✅ Lock your dataset version before analysis
- ✅ Document any exclusion criteria
- ✅ Export the analysis code
- ✅ Keep the audit trail
Troubleshooting
"Model failed to converge"
Causes: - Perfect separation (outcome perfectly predicted) - Multicollinearity (highly correlated predictors) - Too many predictors for sample size
Solutions: 1. Remove highly correlated variables 2. Reduce number of predictors 3. Use regularization (coming soon)
"Not enough events"
Cox regression needs sufficient events per variable. Rule of thumb: ≥10 events per predictor.
| Predictors | Minimum Events Needed |
|---|---|
| 3 | 30 |
| 5 | 50 |
| 10 | 100 |
"Proportional hazards violated"
The PH assumption may not hold. Options:
- Stratify by the violating variable
- Use time-varying coefficients
- Consider parametric survival models