Mastering ATLAS Cohort Definitions: A Clinical Researcher’s Complete Guide

The Journey So Far

Building on our exploration of the OMOP Common Data Model, standardized vocabularies like SNOMED CT, RxNorm, and LOINC, ETL fundamentals, and data quality assessment with Achilles, we’ve reached a pivotal milestone: defining patient cohorts in OHDSI ATLAS.

Everything we’ve built so far—the PostgreSQL databases, the vocabulary mappings, the synthetic patient data from Synthea, the data quality reports—exists for one purpose: to answer clinical questions about real-world patient populations. And in observational health research, every question begins with a cohort.

As the Book of OHDSI puts it:

“A cohort is not just a list of patients—it’s the operationalization of a clinical question.”

This post documents our experience connecting ATLAS to our own database, building a Type 2 Diabetes cohort from scratch, understanding what happens in the backend tables, and navigating the WebAPI schema compatibility challenges we encountered along the way.

Why ATLAS and OHDSI Matter: Global Impact at Scale

Before diving into cohort building, let’s understand why OHDSI’s tooling has become essential for healthcare research worldwide.

The OHDSI Network by the Numbers

OHDSI (Observational Health Data Sciences and Informatics) has grown into one of the largest international collaboratives in healthcare data science:

Metric	Scale
Collaborators	4,200+ across 83 countries
Patient Records	~810 million (~12% of world population)
Data Partners	453+ organizations (EHRs, claims, registries)
ATLAS GitHub	294 stars, 50+ contributors
EHDEN Network	100+ data partners across Europe
Publications	150+ peer-reviewed papers

Major regulatory bodies including the FDA (United States), EMA via DARWIN EU (European Union), and Korea HIRA use OHDSI tools for drug safety surveillance and real-world evidence studies. The EHDEN project alone has standardized 500+ million European patient records to the OMOP CDM format.

What is ATLAS?

ATLAS is the flagship web application of the OHDSI ecosystem. It provides a graphical interface for:

Vocabulary exploration - Search millions of standardized medical concepts
Cohort definition - Define patient populations using clinical criteria
Characterization - Compare baseline characteristics across cohorts
Population-level estimation - Causal inference studies
Patient-level prediction - Machine learning models for outcomes
Pathway analysis - Treatment sequences and care patterns

The ATLAS home screen provides navigation to all OHDSI analytical tools

When researchers say they’re “using OHDSI,” they typically mean they’re using ATLAS to design studies that can be executed across the global network of OMOP-standardized databases.

Why Cohorts Are the Foundation of Clinical Research

What is a Cohort?

In clinical research, a cohort is a defined group of patients who share specific characteristics. Unlike ad-hoc database queries, cohorts are:

Reproducible - The same definition produces the same patients every time
Shareable - Definitions can be exchanged between institutions
Standardized - Built on common data models and vocabularies
Versioned - Changes are tracked over time
Portable - Write once, run anywhere across 450+ OMOP databases

Everything in observational research starts with the question: “Which patients?” The validity of any study depends on precisely defining the population. A poorly specified cohort undermines all downstream analyses, no matter how sophisticated the statistical methods.

The Clinical Question Behind Every Cohort

Consider this scenario:

Dr. Sarah Chen, an endocrinologist at a large academic medical center, wants to study cardiovascular outcomes in Type 2 Diabetes patients initiating metformin therapy. She needs to identify patients who:

Have a confirmed T2DM diagnosis

Are starting metformin for the first time

Have sufficient medical history to assess baseline characteristics

Don’t have Type 1 diabetes or gestational diabetes (different populations)

Without standardized cohort definitions, every institution would interpret “Type 2 Diabetes patients on metformin” differently. Some might include anyone who ever took metformin. Others might miss patients coded with specific T2DM subtypes. ATLAS solves this by providing a visual cohort builder that generates portable JSON definitions.

Mapping Clinical Criteria to OMOP Concepts

Dr. Chen’s clinical requirements translate to specific OMOP vocabulary concepts:

Clinical Criterion	OMOP Domain	Vocabulary	Concept ID	Concept Name
T2DM Diagnosis	Condition	SNOMED CT	201826	Type 2 diabetes mellitus
Metformin	Drug	RxNorm	6809	metformin
T1DM (exclude)	Condition	SNOMED CT	201254	Type 1 diabetes mellitus
Gestational DM (exclude)	Condition	SNOMED CT	4024659	Gestational diabetes mellitus

This mapping table is the bridge between clinical thinking and database queries. Every cohort starts here.

Our Database Setup

What We Had Built

After extensive preparation across previous work, our PostgreSQL database contained:

Database: ohdsi_learning
├── cdm (OMOP CDM 5.4)
│   ├── person (2,411 synthetic patients)
│   ├── condition_occurrence (diagnoses)
│   ├── drug_exposure (medications)
│   ├── procedure_occurrence (procedures)
│   ├── measurement (lab results, vitals)
│   └── observation (other clinical findings)
├── vocabulary (OMOP Standardized Vocabularies)
│   ├── concept (~6 million concepts)
│   ├── concept_relationship (hierarchies)
│   └── concept_ancestor (ancestry)
├── results (Analysis outputs)
│   ├── achilles_results (data characterization)
│   └── cohort (generated patient lists)
└── webapi (ATLAS configuration)
    ├── source (data source registration)
    └── source_daimon (schema mappings)

We deployed OHDSI Broadsea using Docker Compose, configured our external database as a data source, and launched ATLAS.

First Success: Data Sources Dashboard

The Data Sources dashboard confirmed our setup was working. ATLAS correctly displayed our 2,411 synthetic patients with demographic breakdowns:

Data Sources Dashboard The CDM Summary shows 2,411 patients with gender distribution from our Synthea data

This visualization is powered by Achilles, OHDSI’s data characterization package. When you click “Data Sources” in ATLAS, you’re seeing pre-computed statistics stored in the achilles_results table.

Building the Type 2 Diabetes Cohort: Step by Step

Let’s walk through creating Dr. Chen’s cohort for Type 2 Diabetes Mellitus—one of the most common chronic conditions affecting over 400 million people worldwide.

Step 1: Vocabulary Search

Every cohort starts with finding the right clinical concepts. In ATLAS, navigate to Search and enter “type 2 diabetes”:

Searching for “type 2 diabetes” returns 299 concepts across multiple vocabularies

The search returns concepts from multiple standardized vocabularies:

SNOMED CT - The primary clinical terminology (concept 201826 for T2DM)
ICD-10-CM - Billing/administrative codes (E11.x family)
Read - UK primary care codes
MedDRA - Adverse event terminology

Notice the “RC” (Record Count) and “DRC” (Descendant Record Count) columns—these show how many records in your database match each concept.

Step 2: Understanding the Concept Hierarchy

SNOMED concept 201826 (“Type 2 diabetes mellitus”) is a clinical finding in the Condition domain. When you build a cohort using this concept, ATLAS can automatically include all descendant concepts—more specific subtypes like:

Type 2 diabetes mellitus with diabetic nephropathy
Type 2 diabetes mellitus with peripheral neuropathy
Type 2 diabetes mellitus with retinopathy
Type 2 diabetes mellitus without complication

This hierarchical inclusion is crucial for capturing all relevant patients. Without descendants, you might miss patients whose diagnosis was coded more specifically.

Step 3: Building the Cohort Definition

Navigate to Cohort Definitions → New Cohort Definition. Here’s what a Type 2 Diabetes cohort looks like in ATLAS:

Cohort Builder for Type 2 Diabetes The cohort builder showing entry events, inclusion criteria, and exit settings

The definition has four key sections:

Cohort Entry Events

a condition occurrence of [Type 2 Diabetes Mellitus]
for the first time in the person's history
with continuous observation of at least 365 days before

This ensures we’re capturing incident (new) cases, not prevalent cases. The 365-day lookback confirms patients were observable before their first diagnosis—essential for establishing baseline characteristics.

Inclusion Criteria

Age at least 18 at index
No prior diagnosis of Type 1 diabetes mellitus
No prior diagnosis of gestational diabetes

These rules refine the population. Each criterion is evaluated against the initial event population, and ATLAS tracks how many patients pass each rule.

Cohort Exit

Event will persist until: fixed duration relative to initial event
Event date to offset from: end date
Number of days offset: 0

This determines how long each patient “stays” in the cohort. Different research questions require different exit strategies—some studies follow patients until they leave the database, others for fixed time windows.

Step 4: Generating the Cohort

After saving the definition, navigate to the Generation tab and click “Generate” for your data source:

Cohort Generation Status Generation completed successfully—but notice the “…” in People and Records columns

Here we encounter our first sign of trouble. The generation status shows COMPLETE, but the People and Records columns display “…” instead of actual counts.

What’s Really Happening: The Backend Cohort Tables

To understand ATLAS fully, you need to know what’s happening in the database when cohorts are generated.

The Data Model for Cohorts

erDiagram
    COHORT_DEFINITION ||--o{ COHORT : generates
    COHORT_DEFINITION ||--o{ COHORT_INCLUSION : has
    COHORT }o--|| PERSON : identifies
    PERSON ||--o{ DRUG_EXPOSURE : has
    PERSON ||--o{ CONDITION_OCCURRENCE : has
    PERSON ||--o{ OBSERVATION_PERIOD : has

    COHORT_DEFINITION {
        int id PK
        varchar name
        text expression
        timestamp created_date
    }

    COHORT {
        int cohort_definition_id FK
        bigint subject_id FK
        date cohort_start_date
        date cohort_end_date
    }

    COHORT_INCLUSION {
        int cohort_definition_id FK
        int rule_sequence
        varchar name
        int design_hash
    }

    PERSON {
        bigint person_id PK
        int gender_concept_id
        int year_of_birth
    }

The Cohort Generation Pipeline

┌──────────────────────────────────────────────────────────────────┐
│                    Cohort Generation Pipeline                     │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌─────────────────┐    ┌─────────────────┐    ┌──────────────┐ │
│  │ ATLAS UI        │───▶│ WebAPI          │───▶│ PostgreSQL   │ │
│  │ (Browser)       │    │ (Java/Tomcat)   │    │ (Database)   │ │
│  └─────────────────┘    └─────────────────┘    └──────────────┘ │
│          │                      │                      │         │
│          ▼                      ▼                      ▼         │
│  ┌─────────────────┐    ┌─────────────────┐    ┌──────────────┐ │
│  │ JSON Definition │───▶│ CIRCE Compiler  │───▶│ SQL Execution│ │
│  │ {expression}    │    │ (SQL Generator) │    │ INSERT INTO  │ │
│  └─────────────────┘    └─────────────────┘    │ results.cohort│ │
│                                                 └──────────────┘ │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

When you click “Generate” in ATLAS:

ATLAS UI sends the cohort definition JSON to WebAPI
CIRCE (the cohort SQL compiler) translates JSON to database-specific SQL
SqlRender adapts the SQL for PostgreSQL syntax
The SQL executes against your CDM data tables
Results are inserted into the results.cohort table

Simplified Generated SQL

Here’s a simplified version of what CIRCE generates for our T2DM cohort:

INSERT INTO results.cohort (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date)
SELECT
    @cohort_id AS cohort_definition_id,
    p.person_id AS subject_id,
    co.condition_start_date AS cohort_start_date,
    op.observation_period_end_date AS cohort_end_date
FROM cdm.condition_occurrence co
JOIN cdm.person p ON co.person_id = p.person_id
JOIN cdm.observation_period op ON p.person_id = op.person_id
WHERE co.condition_concept_id IN (
    SELECT descendant_concept_id
    FROM vocabulary.concept_ancestor
    WHERE ancestor_concept_id = 201826  -- T2DM
)
AND co.condition_start_date >= DATE_ADD(op.observation_period_start_date, INTERVAL 365 DAY)
AND YEAR(co.condition_start_date) - p.year_of_birth >= 18
AND NOT EXISTS (
    SELECT 1 FROM cdm.condition_occurrence co2
    WHERE co2.person_id = p.person_id
    AND co2.condition_concept_id IN (
        SELECT descendant_concept_id
        FROM vocabulary.concept_ancestor
        WHERE ancestor_concept_id = 201254  -- T1DM
    )
    AND co2.condition_start_date < co.condition_start_date
);

The Key Backend Tables

`results.cohort` - The Patient List

SELECT cohort_definition_id, subject_id, cohort_start_date, cohort_end_date
FROM results.cohort
WHERE cohort_definition_id = 2;  -- Type 2 Diabetes cohort

-- Result:
-- cohort_definition_id | subject_id | cohort_start_date | cohort_end_date
-- 2                    | 1001       | 2015-03-15        | 2023-06-30
-- 2                    | 1042       | 2018-07-22        | 2023-12-01
-- 2                    | 1156       | 2020-01-10        | 2023-08-15
-- ... (220 rows total)

`results.cohort_inclusion` - The Rules

SELECT cohort_definition_id, rule_sequence, name
FROM results.cohort_inclusion
WHERE cohort_definition_id = 2;

-- Result:
-- cohort_definition_id | rule_sequence | name
-- 2                    | 0             | Age at least 18 at index
-- 2                    | 1             | No prior Type 1 diabetes

Direct Database Verification

When ATLAS shows unexpected results, always check the database directly:

-- Count patients in each cohort
SELECT cd.name, COUNT(DISTINCT c.subject_id) AS persons
FROM results.cohort c
JOIN webapi.cohort_definition cd ON c.cohort_definition_id = cd.id
GROUP BY cd.name
ORDER BY persons DESC;

-- Result:
-- name                   | persons
-- Type 2 Diabetes Cohort | 220
-- New Statin Users       | 162
-- Multiple Chronic       | 123
-- Metformin Users        | 76

This query confirmed our cohorts were generated correctly—the data is right, even when the UI shows “…“

The Schema Compatibility Problem

What Went Wrong

After successfully generating cohorts, we attempted to run Cohort Characterization—a feature that compares baseline demographics, conditions, and medications across cohorts.

That’s when the errors started:

ERROR: schema "temp" does not exist
ERROR: column "design_hash" of relation "cohort_inclusion" does not exist
ERROR: relation "results.cohort_cache" does not exist
ERROR: relation "results.cohort_censor_stats_cache" does not exist
ERROR: column "mode_id" does not exist

The Root Cause

WebAPI 2.x has evolved significantly beyond the base OMOP CDM specification.

The OMOP CDM defines clinical tables (person, condition_occurrence, etc.) and basic results tables. But WebAPI has added:

Component	Purpose	OMOP CDM	WebAPI 2.x
`temp` schema	Working space for analytics	No	Required
`design_hash` column	Cache invalidation	No	Required
`*_cache` tables	Performance optimization	No	Required
`mode_id` column	Generation mode tracking	No	Required

These additions are managed by Flyway database migrations in WebAPI. When WebAPI starts with its built-in database, these migrations run automatically. But when you connect an external database, they don’t run.

Common Pitfalls in Cohort Design

Based on our experience and insights from the OHDSI community (particularly the Phenotype Phebruary discussions on the forums), here are the most common mistakes:

Pitfall	Why It Happens	Solution
Overly specific concept sets	Using only one concept code, missing subtypes	Include descendants in concept sets
No prior observation requirement	Assuming all patients have complete history	Require minimum 365 days prior observation
Ignoring drug eras	Fragmented prescription records	Use persistence windows or drug eras
Including prevalent users	Not specifying first occurrence	Require “for the first time in history”
Missing exclusion criteria	Not filtering out related conditions	Add explicit exclusion rules (T1DM, gestational DM)
Wrong exit strategy	Patients exiting cohort too early or never	Match exit criteria to research question
Not validating with characterization	Trusting counts without inspection	Always run characterization before analysis

Community Insights from OHDSI Forums

The OHDSI Forums are invaluable for cohort design. Key threads to review:

Phenotype Phebruary: Type 2 Diabetes Mellitus - Community-validated T2DM phenotype with discussion of edge cases
Best Practices for Cohort Design - Consensus recommendations from experienced implementers
Troubleshooting WebAPI Issues - Solutions to common deployment problems

The community has learned that concept set design is 80% of cohort quality. Invest time in building comprehensive, validated concept sets before worrying about complex inclusion logic.

Three Paths Forward for External Databases

Option 1: Manual Schema Patching

Create missing tables and columns reactively as errors appear:

-- Create temp schema
CREATE SCHEMA IF NOT EXISTS temp;
GRANT ALL ON SCHEMA temp TO PUBLIC;

-- Add missing columns
ALTER TABLE results.cohort_inclusion
  ADD COLUMN IF NOT EXISTS design_hash INT;

-- Create cache tables
CREATE TABLE IF NOT EXISTS results.cohort_cache (...);

Pros: Works for core features; good for learning Cons: Incomplete; may break with WebAPI updates; tedious

Option 2: Run WebAPI Flyway Migrations

Apply the official DDL scripts from the WebAPI repository:

git clone https://github.com/OHDSI/WebAPI.git
cd WebAPI/src/main/resources/db/migration/postgresql
# Apply migrations manually or configure Flyway

Pros: Complete and version-matched Cons: Complex setup; requires Flyway knowledge

Option 3: Use Broadsea’s Built-in Database (Recommended)

Start Broadsea with its default configuration, then migrate your data into the pre-configured database:

# Start Broadsea (includes broadsea-atlasdb container)
docker-compose up -d

# Export your CDM data from external database
pg_dump -h localhost -U postgres -d ohdsi_learning \
  --schema=cdm --schema=vocabulary --data-only -f cdm_export.sql

# Import into Broadsea's database
docker exec -i broadsea-atlasdb psql -U postgres < cdm_export.sql

Pros: Zero schema errors; all features work; easiest maintenance Cons: Requires data migration; additional container

Resolving the Schema Issues: Dr. Sarah Chen’s Success

After understanding the root cause, Dr. Chen chose Option 1: Manual Schema Patching to continue her research quickly. She applied the minimal fixes needed to enable cohort generation and characterization.

The Fifteen-Minute Fix

The complete schema patch took just 15 minutes to apply:

-- Phase 1: Core Schema Fixes (for cohort generation)
CREATE SCHEMA IF NOT EXISTS temp;
GRANT ALL ON SCHEMA temp TO PUBLIC;

ALTER TABLE results.cohort_inclusion ADD COLUMN IF NOT EXISTS design_hash INT;
ALTER TABLE results.cohort_inclusion_result ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_inclusion_stats ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_summary_stats ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;
ALTER TABLE results.cohort_censor_stats ADD COLUMN IF NOT EXISTS mode_id INT DEFAULT 0;

-- Create cache tables for performance
CREATE TABLE IF NOT EXISTS results.cohort_cache (
    design_hash INT NOT NULL,
    subject_id BIGINT NOT NULL,
    cohort_start_date DATE NOT NULL,
    cohort_end_date DATE NOT NULL
);

-- Phase 2: Characterization Fixes
ALTER TABLE results.cc_results ADD COLUMN IF NOT EXISTS type VARCHAR(255);
ALTER TABLE results.cc_results ADD COLUMN IF NOT EXISTS concept_id INTEGER DEFAULT 0;
ALTER TABLE results.cc_results ADD COLUMN IF NOT EXISTS aggregate_id INTEGER;
ALTER TABLE results.cc_results ADD COLUMN IF NOT EXISTS aggregate_name VARCHAR(1000);
ALTER TABLE results.cc_results ADD COLUMN IF NOT EXISTS missing_means_zero INTEGER;

-- Grant permissions
GRANT ALL ON ALL TABLES IN SCHEMA results TO PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA temp TO PUBLIC;

After applying these patches and restarting WebAPI, Dr. Chen regenerated her cohorts.

Cohort Generation: Success!

This time, the results displayed correctly in ATLAS:

Cohort Generation Success The Type 2 Diabetes cohort now shows 219 patients with generation status COMPLETE

The cohort generation completed in just 1 second, returning 219 patients who met all inclusion criteria:

First diagnosis of Type 2 Diabetes Mellitus
At least 365 days of prior observation
Age 18 or older at index date
No prior Type 1 or gestational diabetes diagnosis

Inclusion Reports: Understanding Population Attrition

Clicking “View Reports” revealed the inclusion report—a powerful visualization showing how patients flow through each inclusion criterion:

Inclusion Report Success The inclusion report shows 99.55% match rate with detailed population visualization

The report shows:

220 initial qualifying events from patients with T2DM diagnosis
99.55% match rate (219 of 220 passed all inclusion criteria)
1 patient excluded by the age requirement or prior diabetes exclusions

The green population visualization (sometimes called an “attrition diagram”) helps researchers understand exactly where patients are filtered out. This is crucial for validating that inclusion criteria aren’t overly restrictive.

Running Characterization: The Complete Picture

With cohorts generating correctly, Dr. Chen moved to Characterization—the feature that compares baseline demographics, conditions, and medications across cohorts.

She configured a characterization analysis with:

Target Cohort: Type 2 Diabetes Mellitus (219 patients)
Feature Analyses: Demographics, conditions, drug exposures, procedures
Stratification: None (baseline analysis)

After clicking “Generate,” she monitored the Jobs page:

Jobs Completed The Jobs page showing successful completion of cohort generation and characterization

All jobs completed successfully:

Jobs 7-11: Cohort generation for all 5 cohorts ✅
Job 14: Characterization generation ✅

Characterization Results: Demographics Analysis

The characterization results revealed the clinical profile of her T2DM cohort:

Characterization results showing 866 records across 10 reports with demographics breakdown

Key findings from the demographics analysis:

10 reports generated covering different clinical domains
866 total records of baseline characteristics
Age distribution showing patients primarily in 40-70 age range
Condition prevalence showing common comorbidities
Drug exposure patterns at baseline

The demographic breakdown matched clinical expectations for a Type 2 Diabetes population:

Higher prevalence in middle-aged and older adults
Common comorbidities including hypertension, hyperlipidemia
Baseline medication patterns consistent with diabetes management

The Complete End-to-End Workflow

Dr. Chen’s journey from clinical question to characterized cohort demonstrates the complete ATLAS workflow:

┌──────────────────────────────────────────────────────────────────────────┐
│                Dr. Sarah Chen's Research Workflow                         │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌───────────┐ │
│  │ 1. Clinical │───▶│ 2. Concept  │───▶│ 3. Build    │───▶│ 4. Add    │ │
│  │    Question │    │    Mapping  │    │    Entry    │    │  Inclusion│ │
│  │ "T2DM pts"  │    │  SNOMED     │    │    Events   │    │   Rules   │ │
│  └─────────────┘    │  201826     │    │             │    │           │ │
│                     └─────────────┘    └─────────────┘    └───────────┘ │
│                                                                   │      │
│                                                                   ▼      │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌───────────┐ │
│  │ 8. Clinical │◀───│ 7. Analyze  │◀───│ 6. View     │◀───│ 5. Generate│ │
│  │    Insights │    │  Character- │    │  Inclusion  │    │   Cohort  │ │
│  │ "Ready for  │    │    ization  │    │   Reports   │    │  219 pts  │ │
│  │  estimation"│    │ 866 records │    │  99.55%     │    │           │ │
│  └─────────────┘    └─────────────┘    └─────────────┘    └───────────┘ │
│                                                                           │
│  ✅ Result: Validated T2DM cohort ready for population-level estimation  │
└──────────────────────────────────────────────────────────────────────────┘

What Dr. Chen Verified

Step	Feature	Result	Validated
1	Vocabulary Search	Found SNOMED 201826	✅
2	Cohort Definition	Created with 3 inclusion rules	✅
3	Cohort Generation	219 patients in < 1 second	✅
4	Inclusion Reports	99.55% match rate displayed	✅
5	Population Visualization	Attrition diagram renders	✅
6	Characterization	866 records across 10 reports	✅
7	Demographics Analysis	Age groups with counts	✅
8	Job Tracking	All 14 jobs COMPLETED	✅

Key Takeaways

Here’s what we learned from building cohorts in ATLAS:

✅ Cohorts are algorithms, not just patient lists — They’re reproducible across any OMOP CDM database worldwide

✅ Three components define every cohort — Entry event, inclusion criteria, exit criteria

✅ ATLAS generates SQL — Understanding backend tables helps debugging when the UI shows unexpected results

✅ Concept sets are building blocks — Invest time in creating comprehensive, validated concept sets with descendants

✅ Always validate with characterization — Never trust cohort counts alone; inspect demographics and clinical features

✅ WebAPI ≠ OMOP CDM — The application has evolved beyond the base data model specification

✅ Check the database directly — When UI shows “…”, the data may be correct in results.cohort

What’s Next: From Cohorts to Evidence

With the ability to precisely define patient populations, we’re ready to explore what comes next: using these cohorts to generate real-world evidence.

We’ll take our T2DM cohort and ask a comparative effectiveness question:

“Among patients who initiate metformin for Type 2 Diabetes, does adding an SGLT2 inhibitor reduce cardiovascular events compared to adding a sulfonylurea?”

This is the domain of population-level estimation—causal inference studies that use propensity score matching, stratification, and outcome modeling to estimate treatment effects from observational data.

Stay tuned as we move from defining who to asking what if.

Resources

OHDSI Community

The Book of OHDSI - Chapter 10: Cohorts - Definitive cohort design guide
ATLAS User Guide - Official documentation
OHDSI Forums - Community Q&A and phenotype discussions
EHDEN Academy - Free online OHDSI courses
Phenotype Phebruary T2DM Thread - Community T2DM phenotype

Technical Resources

WebAPI GitHub - Source code and migrations
Broadsea GitHub - Docker deployment
CommonDataModel - OMOP CDM DDL
OHDSI Cohort Tutorial 2022 - Step-by-step PDF guide

Have you encountered similar challenges building cohorts in ATLAS? What phenotypes are you working on? Join the discussion on the OHDSI Forums or share your experience in the comments below.

Tags: OHDSI, ATLAS, Cohort Definitions, OMOP CDM, Type 2 Diabetes, Clinical Research, Real World Evidence, WebAPI, PostgreSQL, Phenotyping