OHDSI WebAPI Schema Compatibility: A Deep Dive into External Database Integration

The Hidden Requirements Nobody Tells You About

You’ve built your OMOP CDM database. You’ve loaded Synthea data. You’ve set up the standardized vocabularies. Everything looks perfect. Then you connect ATLAS to your external PostgreSQL database and… nothing works quite right.

Cohort generation shows “…” instead of patient counts. Characterization fails with cryptic schema errors. The UI displays “COMPLETE” but the data seems invisible.

This is the story of what we discovered connecting ATLAS to our own database—and the exact fixes that made everything work.

The Setup: What We Had Built

After extensive preparation, our PostgreSQL database contained a complete OHDSI environment:

Database: ohdsi_learning
├── cdm (OMOP CDM 5.4)
│   ├── person (2,411 synthetic patients)
│   ├── condition_occurrence
│   ├── drug_exposure
│   ├── procedure_occurrence
│   └── measurement
├── vocabulary (~6 million concepts)
│   ├── concept
│   ├── concept_relationship
│   └── concept_ancestor
├── results (Achilles outputs)
└── webapi (Source configuration)

We deployed OHDSI Broadsea via Docker Compose, configured our external database as a data source, and launched ATLAS.

Initial Success: The Data Sources dashboard showed our 2,411 patients with demographic breakdowns. Achilles results displayed correctly. Vocabulary search worked perfectly.

Then we tried to use the cohorts…

Phase 1: The Cohort Generation Mystery

The Symptom

After creating a Type 2 Diabetes cohort definition and clicking “Generate”, the status showed COMPLETE—but the People and Records columns displayed “…” instead of actual counts.

Cohort showing counts after fixes After fixes: cohorts now show actual patient counts (219 for T2DM)

Direct Database Check

Our first troubleshooting step was to query the database directly:

SELECT cohort_definition_id, COUNT(DISTINCT subject_id) AS persons
FROM results.cohort
GROUP BY cohort_definition_id;

-- Result: 219 patients in T2DM cohort!

The data was correct—the problem was in the UI/API layer.

The First Errors

Checking WebAPI logs revealed the real issues:

ERROR: schema "temp" does not exist
ERROR: column "design_hash" of relation "cohort_inclusion" does not exist
ERROR: relation "results.cohort_cache" does not exist
ERROR: column "mode_id" of relation "cohort_inclusion_result" does not exist

Root Cause: WebAPI ≠ OMOP CDM

WebAPI 2.x has evolved significantly beyond the base OMOP CDM specification.

The OMOP Common Data Model defines the clinical data tables. But WebAPI requires additional infrastructure:

Component	OMOP CDM	WebAPI 2.x	Purpose
`temp` schema	No	Required	Working space for analytics
`design_hash` column	No	Required	Cache invalidation
`mode_id` column	No	Required	Generation mode tracking
`*_cache` tables	No	Required	Performance optimization

When WebAPI starts with its built-in database, Flyway migrations create these automatically. With an external database, they don’t exist.

Phase 1 Fix: Core Schema Patches

Here’s what we applied to enable cohort generation:

-- 1. Create temp schema
CREATE SCHEMA IF NOT EXISTS temp;
GRANT ALL ON SCHEMA temp TO PUBLIC;

-- 2. Add design_hash column
ALTER TABLE results.cohort_inclusion
  ADD COLUMN IF NOT EXISTS design_hash INT;

-- 3. Add mode_id columns
ALTER TABLE results.cohort_inclusion_result
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_inclusion_stats
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_summary_stats
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_censor_stats
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;

-- 4. Create cache tables
CREATE TABLE IF NOT EXISTS results.cohort_cache (
    design_hash INT NOT NULL,
    subject_id BIGINT NOT NULL,
    cohort_start_date DATE NOT NULL,
    cohort_end_date DATE NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_cohort_cache_hash
  ON results.cohort_cache(design_hash);

CREATE TABLE IF NOT EXISTS results.cohort_censor_stats_cache (
    design_hash INT NOT NULL,
    lost_count BIGINT NOT NULL
);

CREATE TABLE IF NOT EXISTS results.cohort_inclusion_result_cache (
    design_hash INT NOT NULL,
    mode_id INT NOT NULL,
    inclusion_rule_mask BIGINT NOT NULL,
    person_count BIGINT NOT NULL
);

CREATE TABLE IF NOT EXISTS results.cohort_inclusion_stats_cache (
    design_hash INT NOT NULL,
    rule_sequence INT NOT NULL,
    mode_id INT NOT NULL,
    person_count BIGINT NOT NULL,
    gain_count BIGINT NOT NULL,
    person_total BIGINT NOT NULL
);

CREATE TABLE IF NOT EXISTS results.cohort_summary_stats_cache (
    design_hash INT NOT NULL,
    mode_id INT NOT NULL,
    base_count BIGINT NOT NULL,
    final_count BIGINT NOT NULL
);

After restart: Cohort generation worked! All 5 cohorts showed correct patient counts:

Cohort	Patients
Type 2 Diabetes	219
Multiple Chronic Conditions	486
Metformin New Users	0
New Statin Users	0
Elevated HbA1c	69

Inclusion reports now displayed match rates and population visualizations correctly.

Phase 2: The Characterization Challenge

With cohort generation working, we tried Cohort Characterization—comparing demographics, conditions, and medications across cohorts.

New Errors Appeared

ERROR: column "type" of relation "cc_results" does not exist
ERROR: column "concept_id" of relation "cc_results" does not exist

Characterization uses different result tables than cohort generation.

Discovery: The WebAPI DDL Endpoint

WebAPI provides an endpoint that returns the expected schema:

curl http://localhost:8080/WebAPI/ddl/results

This revealed the complete cc_results table structure including columns our database was missing:

type - Classification of the result
concept_id - Linked vocabulary concept
aggregate_id - Aggregation identifier
aggregate_name - Aggregation label
missing_means_zero - Zero handling flag

Phase 2 Fix: Characterization Schema

-- 1. Add missing cc_results columns
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS type VARCHAR(255);
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS concept_id INTEGER DEFAULT 0;
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS aggregate_id INTEGER;
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS aggregate_name VARCHAR(1000);
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS missing_means_zero INTEGER;

-- 2. Create temporal results tables
CREATE TABLE IF NOT EXISTS results.cc_temporal_results (
  type VARCHAR(255),
  fa_type VARCHAR(255),
  cc_generation_id BIGINT,
  analysis_id INTEGER,
  analysis_name VARCHAR(1000),
  covariate_id BIGINT,
  covariate_name VARCHAR(1000),
  strata_id BIGINT,
  strata_name VARCHAR(1000),
  concept_id INTEGER,
  count_value BIGINT,
  avg_value DOUBLE PRECISION,
  cohort_definition_id BIGINT,
  time_id INTEGER,
  start_day INTEGER,
  end_day INTEGER
);

CREATE TABLE IF NOT EXISTS results.cc_temporal_annual_results (
  type VARCHAR(255),
  fa_type VARCHAR(255),
  cc_generation_id BIGINT,
  analysis_id INTEGER,
  analysis_name VARCHAR(1000),
  covariate_id BIGINT,
  covariate_name VARCHAR(1000),
  strata_id BIGINT,
  strata_name VARCHAR(1000),
  concept_id INTEGER,
  count_value BIGINT,
  avg_value DOUBLE PRECISION,
  cohort_definition_id BIGINT,
  event_year INTEGER
);

-- 3. Grant permissions
GRANT ALL ON ALL TABLES IN SCHEMA results TO PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA temp TO PUBLIC;

After restart: Characterization completed successfully!

10 reports generated
866 total records
Demographics analysis with age group distributions

Complete Test Results

Functionality Verification

Feature	Before Fix	After Fix	Status
Cohort Generation	Shows “…”	Shows counts	✅
Inclusion Reports	Fails	Works with statistics	✅
Population Visualization	Fails	Displays correctly	✅
Characterization	ERROR	866 records displayed	✅
Demographics Analysis	Fails	Age groups with counts	✅

Jobs Execution Summary

Job	Name	Status	Duration
7	Type 2 Diabetes cohort	COMPLETED	1 min
8	Multiple Chronic Conditions	COMPLETED	< 1 min
9-11	Other cohorts	COMPLETED	< 1 min each
14	Characterization	COMPLETED	1 min

Architecture Understanding

What Broadsea Looks Like

┌─────────────────────────────────────────────────────────────┐
│                    OHDSI Broadsea Stack                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Browser → Traefik → ATLAS (nginx) → WebAPI (Tomcat)      │
│                           ↓                                 │
│              PostgreSQL (ohdsi_learning)                    │
│     ┌─────────┬───────────┬─────────┬──────────┐          │
│     │   cdm   │ vocabulary│ results │  webapi  │          │
│     │ (data)  │ (concepts)│ (output)│ (config) │          │
│     └─────────┴───────────┴─────────┴──────────┘          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Where Schema Gaps Occur

┌──────────────────────────────────────────────────────┐
│              OMOP CDM (What You Built)               │
│  ✓ person, condition_occurrence, drug_exposure...   │
│  ✓ vocabulary.concept, concept_relationship...      │
│  ✓ results.cohort, cohort_definition                │
└──────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────┐
│              WebAPI 2.x Requirements (Gap)           │
│  ✗ temp schema                                       │
│  ✗ design_hash, mode_id columns                     │
│  ✗ cohort_cache, cohort_*_cache tables              │
│  ✗ cc_results additional columns                    │
│  ✗ cc_temporal_results, cc_temporal_annual_results  │
└──────────────────────────────────────────────────────┘

Three Paths Forward

Option 1: Manual Patching (What We Did)

Apply SQL fixes reactively as errors appear.

Pros:

Learn exactly what WebAPI needs
Good for understanding the system
Works for core features

Cons:

May miss edge cases
Tedious
May break with WebAPI updates

Option 2: Run Flyway Migrations

Apply official DDL scripts from WebAPI repository:

git clone https://github.com/OHDSI/WebAPI.git
cd WebAPI/src/main/resources/db/migration/postgresql
# Apply migrations manually or configure Flyway

Pros:

Complete and version-matched
Official approach

Cons:

Complex setup
Requires Flyway knowledge

Option 3: Use Broadsea’s Built-in Database (Recommended)

Migrate your data into Broadsea’s pre-configured database:

# Export from external database
pg_dump -h localhost -U postgres -d ohdsi_learning \
  --schema=cdm --schema=vocabulary --data-only -f export.sql

# Import into Broadsea's database
docker exec -i broadsea-atlasdb psql -U postgres < export.sql

Pros:

Zero schema errors
All features work immediately
Easiest long-term maintenance

Cons:

Requires data migration
Additional container

Key Lessons Learned

1. WebAPI ≠ OMOP CDM

The OMOP Common Data Model specification and WebAPI’s operational requirements have diverged. WebAPI needs additional infrastructure beyond the CDM.

2. The DDL Endpoint is Your Friend

When troubleshooting, fetch the expected schema:

GET /WebAPI/ddl/results

This tells you exactly what WebAPI expects to find.

3. Check the Database First

When the UI shows unexpected results, query the database directly. The data may be correct even when the display is broken.

4. Flyway Manages WebAPI Schema

WebAPI uses Flyway for database migrations. These run automatically on built-in databases but not on external ones.

5. Cache Tables Enable UI Features

The *_cache tables aren’t just for performance—they’re required for inclusion reports and statistics to display in the UI.

6. Restart After Schema Changes

After modifying the database schema, restart the WebAPI container for changes to take effect.

Quick Reference: Complete Fix Script

For anyone facing similar issues, here’s the complete script:

-- ============================================
-- WebAPI 2.x Schema Compatibility Fixes
-- For external PostgreSQL databases
-- ============================================

-- PHASE 1: Core Infrastructure
CREATE SCHEMA IF NOT EXISTS temp;
GRANT ALL ON SCHEMA temp TO PUBLIC;

-- PHASE 2: Cohort Generation Support
ALTER TABLE results.cohort_inclusion
  ADD COLUMN IF NOT EXISTS design_hash INT;

ALTER TABLE results.cohort_inclusion_result
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_inclusion_stats
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_summary_stats
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_censor_stats
  ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;

-- Cache tables
CREATE TABLE IF NOT EXISTS results.cohort_cache (
    design_hash INT NOT NULL,
    subject_id BIGINT NOT NULL,
    cohort_start_date DATE NOT NULL,
    cohort_end_date DATE NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_cohort_cache_hash
  ON results.cohort_cache(design_hash);

CREATE TABLE IF NOT EXISTS results.cohort_censor_stats_cache (
    design_hash INT NOT NULL,
    lost_count BIGINT NOT NULL
);

CREATE TABLE IF NOT EXISTS results.cohort_inclusion_result_cache (
    design_hash INT NOT NULL,
    mode_id INT NOT NULL,
    inclusion_rule_mask BIGINT NOT NULL,
    person_count BIGINT NOT NULL
);

CREATE TABLE IF NOT EXISTS results.cohort_inclusion_stats_cache (
    design_hash INT NOT NULL,
    rule_sequence INT NOT NULL,
    mode_id INT NOT NULL,
    person_count BIGINT NOT NULL,
    gain_count BIGINT NOT NULL,
    person_total BIGINT NOT NULL
);

CREATE TABLE IF NOT EXISTS results.cohort_summary_stats_cache (
    design_hash INT NOT NULL,
    mode_id INT NOT NULL,
    base_count BIGINT NOT NULL,
    final_count BIGINT NOT NULL
);

-- PHASE 3: Characterization Support
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS type VARCHAR(255);
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS concept_id INTEGER DEFAULT 0;
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS aggregate_id INTEGER;
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS aggregate_name VARCHAR(1000);
ALTER TABLE results.cc_results
  ADD COLUMN IF NOT EXISTS missing_means_zero INTEGER;

CREATE TABLE IF NOT EXISTS results.cc_temporal_results (
  type VARCHAR(255),
  fa_type VARCHAR(255),
  cc_generation_id BIGINT,
  analysis_id INTEGER,
  analysis_name VARCHAR(1000),
  covariate_id BIGINT,
  covariate_name VARCHAR(1000),
  strata_id BIGINT,
  strata_name VARCHAR(1000),
  concept_id INTEGER,
  count_value BIGINT,
  avg_value DOUBLE PRECISION,
  cohort_definition_id BIGINT,
  time_id INTEGER,
  start_day INTEGER,
  end_day INTEGER
);

CREATE TABLE IF NOT EXISTS results.cc_temporal_annual_results (
  type VARCHAR(255),
  fa_type VARCHAR(255),
  cc_generation_id BIGINT,
  analysis_id INTEGER,
  analysis_name VARCHAR(1000),
  covariate_id BIGINT,
  covariate_name VARCHAR(1000),
  strata_id BIGINT,
  strata_name VARCHAR(1000),
  concept_id INTEGER,
  count_value BIGINT,
  avg_value DOUBLE PRECISION,
  cohort_definition_id BIGINT,
  event_year INTEGER
);

-- PHASE 4: Permissions
GRANT ALL ON ALL TABLES IN SCHEMA results TO PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA temp TO PUBLIC;

End-to-End Case Study: Dr. Sarah Chen’s Research Workflow

With all schema fixes in place, let’s walk through a complete clinical research workflow—the kind of study that ATLAS enables across the global OHDSI network.

The Research Question

Dr. Sarah Chen, an endocrinologist at an academic medical center, wants to characterize patients with Type 2 Diabetes in her institution’s data to understand:

What is the demographic profile of T2DM patients?
How do different age groups compare?
What comorbidities are most common?

Step 1: Define the Cohort

Dr. Chen creates a cohort definition in ATLAS:

Entry Event: First diagnosis of Type 2 Diabetes Mellitus (SNOMED 201826)

Requires 365 days prior observation
Limited to first occurrence

Inclusion Criteria: Age ≥ 18 at index date

Exit: End of continuous observation

Step 2: Generate the Cohort

After clicking “Generate”, ATLAS executes the cohort SQL against our Synthea database:

Cohort Generation Results Type 2 Diabetes cohort: 219 patients identified, generation completed in 1 second

Results:

219 patients meet all criteria
99.55% match rate (219 of 220 initial events passed inclusion rules)
Generation duration: 00:00:01

Step 3: Review Inclusion Report

The inclusion report shows how patients flow through each criterion:

Inclusion report showing 99.55% match rate with population visualization

Criterion	N	% Satisfied
Initial Events (T2DM diagnosis)	220	100%
Age ≥ 18 at index	219	99.55%
Final Cohort	219	99.55%

The population visualization (green area) shows the overlap between criteria—a useful tool for understanding attrition.

Step 4: Run Characterization

Dr. Chen creates a characterization analysis to understand patient demographics. She selects:

Cohorts: Type 2 Diabetes Cohort
Features: Demographics (Age Group, Gender, Race, Ethnicity)
Domain: Demographics

After clicking “Generate”, the characterization job completes successfully:

Characterization results: 10 reports with 866 records showing age group distribution

Demographics Analysis Results:

Age Group	Count	Percentage
40-44	43	19.63%
55-59	31	14.16%
30-34	23	10.50%
50-54	23	10.50%
45-49	22	10.05%
60-64	22	10.05%

Step 5: Verify Job Completion

The Jobs page shows the complete execution history:

Jobs History Job 14 (Characterization) COMPLETED - all cohort generation and characterization jobs successful

Job	Type	Status	Duration
7-11	Cohort Generation	COMPLETED	< 1 min each
14	Characterization	COMPLETED	1 min

What Dr. Chen Learned

From this analysis, Dr. Chen discovers:

Her T2DM population skews toward middle age (40-59)
Nearly all patients (99.55%) meet the adult age criterion
The cohort is reproducible—running the same definition anywhere in the OHDSI network would apply identical logic

The Power of Reproducibility

The JSON cohort definition Dr. Chen created can be:

Exported and shared with collaborators worldwide
Imported into any ATLAS instance connected to an OMOP CDM database
Executed against 450+ databases in the OHDSI network
Versioned for regulatory submissions and publications

This is the promise of OHDSI: write once, run anywhere.

Verified Test Results Summary

Our Playwright-based validation confirmed all features working after schema fixes:

Feature	Status	Evidence
Cohort Generation	✅ Working	219 patients for T2DM
Inclusion Reports	✅ Working	99.55% match rate displayed
Population Visualization	✅ Working	Green area renders correctly
Characterization	✅ Working	866 records, 10 reports
Demographics Analysis	✅ Working	Age groups with counts
Jobs Tracking	✅ Working	All jobs show COMPLETED

What’s Next

Having resolved the schema compatibility issues and validated the complete workflow, our next step is migrating to Broadsea’s built-in database for long-term stability. This will ensure full feature compatibility as WebAPI evolves.

We’re also ready to explore population-level estimation—using our validated cohorts to ask comparative effectiveness questions like:

“Among T2DM patients initiating metformin, does adding an SGLT2 inhibitor reduce cardiovascular events compared to adding a sulfonylurea?”

That’s the power of reproducible cohort definitions—once you’ve solved “which patients?”, you can ask any clinical question.

Resources

OHDSI Documentation

WebAPI GitHub Repository - Source code and migrations
Broadsea Documentation - Docker deployment
OHDSI Forums - Community support
The Book of OHDSI - Comprehensive guide

Mastering ATLAS Cohort Definitions - Complete guide to building cohorts

Have you encountered WebAPI schema issues? What solutions worked for your deployment? Share your experience on the OHDSI Forums.

Tags: OHDSI, WebAPI, PostgreSQL, Schema Compatibility, OMOP CDM, Broadsea, Database Integration, Cohort Generation, Characterization, Troubleshooting

OHDSI WebAPI Schema Compatibility: A Deep Dive into External Database Integration

The Hidden Requirements Nobody Tells You About

The Setup: What We Had Built

Phase 1: The Cohort Generation Mystery

The Symptom

Direct Database Check

The First Errors

Root Cause: WebAPI ≠ OMOP CDM

Phase 1 Fix: Core Schema Patches

Phase 2: The Characterization Challenge

New Errors Appeared

Discovery: The WebAPI DDL Endpoint

Phase 2 Fix: Characterization Schema

Complete Test Results

Functionality Verification

Jobs Execution Summary

Architecture Understanding

What Broadsea Looks Like

Where Schema Gaps Occur

Three Paths Forward

Option 1: Manual Patching (What We Did)

Option 2: Run Flyway Migrations

Option 3: Use Broadsea’s Built-in Database (Recommended)

Key Lessons Learned

1. WebAPI ≠ OMOP CDM

2. The DDL Endpoint is Your Friend

3. Check the Database First

4. Flyway Manages WebAPI Schema

5. Cache Tables Enable UI Features

6. Restart After Schema Changes

Quick Reference: Complete Fix Script

End-to-End Case Study: Dr. Sarah Chen’s Research Workflow

The Research Question

Step 1: Define the Cohort

Step 2: Generate the Cohort

Step 3: Review Inclusion Report

Step 4: Run Characterization

Step 5: Verify Job Completion

What Dr. Chen Learned

The Power of Reproducibility

Verified Test Results Summary

What’s Next

Resources

OHDSI Documentation

Related Posts