Skip to content

Running Statistical Analysis

Configure and run reproducible statistical analyses with full traceability.


Analysis Types

MedTWIN supports several analysis types, all deterministic—same input always produces same output.

Logistic Regression

Use when: Binary outcome (yes/no, 0/1)

Output: - Odds ratios with 95% CI - P-values for each predictor - Model fit statistics (AUC, Hosmer-Lemeshow) - Classification metrics (sensitivity, specificity)

Example: Predicting 30-day mortality based on age, comorbidities, and lab values.


Cox Proportional Hazards

Use when: Time-to-event with censoring

Output: - Hazard ratios with 95% CI - Kaplan-Meier curves - Log-rank test - Concordance index

Example: Survival analysis comparing treatment groups.


Linear Regression

Use when: Continuous outcome

Output: - Coefficients with 95% CI - R² and adjusted R² - Residual diagnostics - Predicted vs actual plots

Example: Predicting length of stay based on patient characteristics.


Descriptive Statistics

Use when: Summarizing baseline characteristics

Output: - Table 1 (demographics by group) - Mean/SD for continuous variables - N (%) for categorical variables - Standardized differences


Configuration

Step 1: Select Analysis Type

┌─────────────────────────────────────────┐
│  Select Analysis Type                   │
├─────────────────────────────────────────┤
│  ○ Logistic Regression                  │
│  ● Cox Proportional Hazards  ← selected │
│  ○ Linear Regression                    │
│  ○ Descriptive Statistics               │
└─────────────────────────────────────────┘

Step 2: Define Variables

For Cox Regression:

Field Description Example
Time Variable Time to event or censoring followup_days
Event Indicator 1 = event occurred, 0 = censored mortality
Primary Predictor Main exposure of interest treatment_group
Covariates Confounders to adjust for age, sex, comorbidity_score

Step 3: Configure Options

Analysis Configuration:

  # Variable selection
  feature_selection: true
  selection_method: "backward"  # forward, backward, stepwise
  selection_criterion: "AIC"    # AIC, BIC, p-value

  # Validation
  cross_validation: true
  cv_folds: 5

  # Missing data
  missing_strategy: "complete_case"  # complete_case, imputation

  # Output
  confidence_level: 0.95
  decimal_places: 3

Running the Analysis

Start Analysis

Click Run Analysis. You'll see:

┌─────────────────────────────────────────┐
│  Analysis Running...                    │
│                                         │
│  ████████████░░░░░░░░  60%             │
│                                         │
│  ✓ Data validation                      │
│  ✓ Missing data handling                │
│  ◉ Fitting model...                     │
│  ○ Generating results                   │
│  ○ Creating visualizations              │
└─────────────────────────────────────────┘

Typical Runtime

Data Size Expected Time
< 1,000 rows 5-15 seconds
1,000 - 10,000 rows 15-45 seconds
10,000 - 100,000 rows 1-3 minutes
> 100,000 rows 3-10 minutes

Results

Results Table

┌────────────────────────────────────────────────────────────┐
│  Cox Regression Results                                    │
│  Outcome: mortality | N = 1,247 | Events = 312            │
├──────────────────┬─────────┬───────────────┬──────────────┤
│  Variable        │   HR    │    95% CI     │   P-value    │
├──────────────────┼─────────┼───────────────┼──────────────┤
│  age (per year)  │   1.04  │  1.02 - 1.06  │   <0.001 **  │
│  sex (male)      │   1.23  │  0.98 - 1.54  │    0.072     │
│  hba1c (per %)   │   1.42  │  1.18 - 1.71  │   <0.001 **  │
│  egfr (per 10)   │   0.91  │  0.85 - 0.97  │    0.004 *   │
│  treatment       │   0.67  │  0.52 - 0.86  │    0.002 *   │
└──────────────────┴─────────┴───────────────┴──────────────┘

Model Diagnostics

  • Concordance Index: 0.72 (95% CI: 0.68 - 0.76)
  • Proportional Hazards Test: Global p = 0.34 (assumption met)
  • Sample Size: 1,247 (312 events, 935 censored)

Visualizations

Generated automatically:

  1. Kaplan-Meier Curves - Survival by group
  2. Forest Plot - Hazard ratios with CI
  3. Schoenfeld Residuals - PH assumption check
  4. Predicted Risk Distribution - Model calibration

Traceability

Every result links back to its source.

Click Any Statistic

HR = 1.42 (95% CI: 1.18 - 1.71)
├── Variable: hba1c_baseline
├── Analysis: Cox Regression
├── Run ID: run_20240115_143022
├── N: 1,247 (312 events)
├── Data Source:
│   └── master_dataset v2
│   └── Mapping: mapping_v2_20240110
│   └── Original file: diabetes_cohort.xlsx
├── Computation:
│   └── Python: lifelines.CoxPHFitter
│   └── [View full code]
└── Audit:
    └── Created: 2024-01-15 14:30:22
    └── User: dr.smith@hospital.org
    └── [View audit log]

Chain of Custody

Original Data → Mapping → Master Dataset → Analysis → Result
     │              │            │             │         │
     ▼              ▼            ▼             ▼         ▼
  SHA-256       Version 2    Snapshot      Run #3    Locked
  verified      committed    frozen        complete  immutable

Re-running Analysis

When Data Changes

If you update your mapping:

  1. Previous analysis results are preserved
  2. Click Re-run with Current Data
  3. New results are saved as a new run
  4. Compare results side-by-side

Sensitivity Analysis

Run variations to test robustness:

Run Configuration Key Result
Run 1 Complete case HR = 1.42
Run 2 Multiple imputation HR = 1.38
Run 3 Excluding outliers HR = 1.45

Exporting Results

Export Options

Format Contents
Tables (Word/Excel) Results tables, formatted
Figures (PNG/SVG) High-res visualizations
Code (Python/R) Reproducible analysis script
Full Bundle Data + code + results + audit log

For Journal Submission

Click Export for Publication:

  • Tables formatted per journal guidelines
  • Figures at 300 DPI
  • Methods text describing analysis
  • Supplementary materials

Best Practices

Before Running

  • ✅ Verify variable mappings are correct
  • ✅ Check for sufficient sample size
  • ✅ Review missing data patterns
  • ✅ Pre-specify your analysis plan

Interpreting Results

  • ✅ Check model diagnostics before interpreting coefficients
  • ✅ Consider clinical significance, not just statistical
  • ✅ Review outliers and influential observations
  • ✅ Run sensitivity analyses

For Reproducibility

  • ✅ Lock your dataset version before analysis
  • ✅ Document any exclusion criteria
  • ✅ Export the analysis code
  • ✅ Keep the audit trail

Troubleshooting

"Model failed to converge"

Causes: - Perfect separation (outcome perfectly predicted) - Multicollinearity (highly correlated predictors) - Too many predictors for sample size

Solutions: 1. Remove highly correlated variables 2. Reduce number of predictors 3. Use regularization (coming soon)

"Not enough events"

Cox regression needs sufficient events per variable. Rule of thumb: ≥10 events per predictor.

Predictors Minimum Events Needed
3 30
5 50
10 100

"Proportional hazards violated"

The PH assumption may not hold. Options:

  1. Stratify by the violating variable
  2. Use time-varying coefficients
  3. Consider parametric survival models

Next Steps