OntoGen · How meaning becomes machine-usable

From a stack of PDFs
to a safe answer at the bedside

Healthcare knows how to treat diabetes. The problem is that the knowledge is trapped in documents and scattered across systems that don't speak the same language. This is the story of how we turn that mess into one governed map of meaning, told through a single patient.

jagat.ai · strategic foresight & applied AI architecture

The problem, in plain words

Healthcare runs on four languages that don't talk to each other

A single diabetic patient is described, simultaneously, in four incompatible vocabularies, plus a pile of guideline PDFs no software can read. A fact obvious to a doctor becomes invisible to the systems that bill, report, and assist.

SNOMED CTwhat the doctor's EHR records

44054006 = "Diabetes mellitus type 2"

ICD-10-CMwhat the biller must submit

E11.22 = "T2DM w/ diabetic CKD"

LOINCwhat the lab reports

4548-4 = "HbA1c" · eGFR result

RxNormwhat the pharmacy dispenses

Metformin (caution if eGFR<30)

And the rules that connect them, like "if Type 2 diabetes has damaged the kidneys, code it as E11.22 and add the CKD stage; don't give full-dose Metformin below eGFR 30," live only in PDF guidelines. Humans read them. Machines can't.

What that costs, the problem catalog

Four kinds of failure, one root cause

When meaning isn't connected, the same gap shows up in four places. These are the problems OntoGen is built to close.

● Care

Unsafe & missed care

A drug-safety rule (don't dose Metformin at low kidney function) never fires because the lab result and the prescription live in unconnected systems.

● Coding

Under-specified codes

The EHR says "type 2 diabetes." The biller submits the vague E11.9 instead of the precise E11.22 plus CKD stage, because nothing maps one to the other.

● Billing

Denials & lost revenue

Vague codes get claims denied and forfeit risk-adjustment revenue the patient's true severity justifies. Money and accuracy both leak.

● Quality

Unreportable measures

Quality programs ask "what % of diabetics have poor HbA1c control?" You can't compute it if "diabetic," "HbA1c," and "poor control" aren't linked by shared meaning.

One patient, all four problems

Meet Maria

🩺

Maria, 61. Type 2 diabetes for 12 years. Her latest labs show HbA1c 9.4% (poorly controlled) and a falling eGFR of 28. Her kidneys are now damaged. Her chart still carries the diagnosis from years ago: plain "type 2 diabetes." Her current Metformin prescription was never re-checked against her kidneys.

care: Metformin unsafe at eGFR<30 coding: chart says E11.9, truth is E11.22 billing: claim under-coded quality: HbA1c 9.4% > 9% poor-control

Every layer that follows moves Maria's case one step closer to a single, safe, correctly-coded, reportable answer. Watch the amber notes.

The spine of the whole system

Seven layers: raw documents to real value

Each layer takes the one below it and adds structure a machine can use. The bottom is a folder of files; the top is the right answer in a clinician's hands. The labels on the right show where this happens in the OntoGen app.

Application / use-case

The safe, coded, reportable answer, at the point of care, in the biller's tool, in the quality report.

App stage · Operate

Semantic layer

One governed "ask-anything" surface every app and AI queries, so the answer is the same everywhere.

Operate · Surface & Playground

Knowledge graph

The map plus real patient data, loaded as a network you can walk: patient → diagnosis → lab → drug.

Operate

Governed golden standard

The map, verified by human experts and frozen as a versioned, trustworthy source of truth.

Ground & Localize · Gold Builder + SME review

Relationships to ontology

How the concepts connect, plus rules and cross-language bridges, the actual map of meaning.

Ground · 5-pass extraction

Extracted concepts

The important "things" pulled out of the documents, each with a source citation and a confidence score.

Ground · 5-pass extraction

Source documents

The raw, human-written truth: coding guidelines, data-exchange specs, clinical pages.

Setup · Standards Picker

Source documents

Everything starts with the documents humans already trust. We don't invent knowledge, we read the authoritative sources.

Goes in

Public, authoritative diabetes sources, chosen in the app's Standards Picker.

Comes out

A frozen, licensed, checksum-verified corpus the engine can read.

# the real diabetes corpus we used ICD-10-CM Official Guidelines FY2026 → the coding rules FHIR R4: Condition, Observation, MedicationRequest, CarePlan, Coverage → how systems exchange data NIDDK "Managing Diabetes" (NIH) → clinical management CDC "Manage Blood Sugar" → targets & monitoring SNOMED CT · LOINC · RxNorm (licensed, kept local, the vocabularies)

Why standards, not just any text?

Because a map is only as trustworthy as its sources. Grounding in ICD-10, FHIR and SNOMED means the result speaks the languages the rest of healthcare already uses, instead of inventing a private one.

Maria: the very rules her case needs, "T2DM with CKD → E11.22, add the stage" and "Metformin caution below eGFR 30," enter the system here, still locked inside prose.

Extracted concepts

The first pass of the engine reads the corpus and pulls out the things that matter, each one tagged with where it came from and how confident the model is.

Goes in

The corpus from Layer 1.

Comes out

A list of concepts, each with a definition, a source citation, and a confidence score.

# real extracted concepts (from our run, lightly trimmed) Blood Glucose source: 08_cdc_diabetes_treatment confidence: high Continuous Glucose Monitor (CGM) source: 07_niddk_diabetes_mgmt confidence: high Diabetes Care Plan source: 05_fhir_r4_careplan confidence: medium Allergy source: 03_fhir_r4_observation rationale: "explicit seed concept mention"

Why a confidence score and a citation on every concept?

So nothing is a black box. Every claim the system makes can be traced back to a real document, and low-confidence items can be sent to a human instead of trusted blindly. Honesty is built into the data, not bolted on.

Maria: "Blood Glucose," "HbA1c," "kidney function," and "Metformin" now exist as first-class concepts the machine can reason about, no longer just words on a CDC page.

Relationships to the ontology

Concepts alone are a glossary. The next passes connect them, into a hierarchy, into rules about what's required, and into cross-language bridges. That connected whole is the ontology: the actual map of meaning.

Goes in

The concepts from Layer 2.

Comes out

Standards-format files: OWL (the map), SKOS (the hierarchy), SHACL (the rules), and crosswalks (the bridges).

# a rule the engine wrote (SHACL): every Artificial Pancreas MUST treat something ArtificialPancreasShape sh:targetClass :Artificial_Pancreas ; sh:property [ sh:path :treats ; sh:minCount 1 ] . # a cross-language bridge (the crosswalk): same idea, two vocabularies ICD-10 E11.9 ⇄ SNOMED 44054006 "Diabetes mellitus type 2" match: exact ICD-10 E11.9 ⇄ SNOMED 73211009 "Diabetes mellitus" match: narrower # SNOMED term is broader than the ICD code

Why bother turning prose into OWL/SHACL files?

Because these are W3C standards, the universal formats for machine-readable meaning. Write the map once in these formats and it loads into any graph database, any reasoning tool, any AI, instead of being locked to one vendor.

Maria: the system now knows diabetes can have a kidney complication, that the complication changes the code, and that her EHR's SNOMED term maps to an ICD-10 code. The rules are finally machine-readable, but not yet trusted.

The governed golden standardwhere humans decide

An AI-drafted map is a strong start, not a source of truth. Layer 4 is where a draft becomes something an organization will stake decisions on, through expert review and a frozen, versioned release.

Goes in

The AI-drafted ontology plus crosswalks from Layer 3, assembled in the Gold Builder.

Comes out

A signed-off, versioned standard (e.g. v1.0), the trusted source of truth.

The Subject-Matter Expert's role

A diabetes coding specialist or clinician reviews the draft one change at a time in the app's diff-review queue. They confirm, correct, or reject each proposal:

draft pair: E11.9 ⇄ SNOMED 73211009 "Diabetes mellitus" match: narrower SME ruling: ✔ confirmed # "correct, SNOMED is broader; ICD defaults unspecified DM to E11.9" draft pair: E10.x mirror proposed by the agent (type-1 variant) SME ruling: ✎ needs review # 4 agent-proposed mirrors held for the expert to resolve

Why is the human step non-negotiable?

Because in healthcare a wrong mapping can mean a denied claim or an unsafe order. The SME is the accountable authority who turns a plausible AI draft into a defensible standard. OntoGen's job is to do 90% of the labor and bring the expert only the decisions that need judgment, and to record who decided what, when. The autonomy dial automates construction, never governance.

Honest status today

Our diabetes golden crosswalk is 50 expert-curated pairs, 46 confirmed directly against the official NLM SNOMED→ICD map, 4 held for SME sign-off, labeled draft-pending-sme-signoff. We never present a draft as final.

Maria: a human expert has now certified that her chart's SNOMED diabetes term maps to the right ICD-10 family, and that the CKD-complication rule is correct. Her case now rests on verified meaning.

The knowledge graph

The governed standard is the map. Now we lay real patient data onto it. The result is a network you can walk, where Maria, her diagnosis, her labs and her meds are all connected nodes.

Goes in

The golden standard (Layer 4) plus actual records (a patient's diagnoses, labs, prescriptions).

Comes out

A queryable graph (loaded via Neo4j Cypher), meaning plus data, walkable as relationships.

Maria's record laid onto the governed map. The dashed edge is the drug-lab collision that was invisible while the lab and the prescription lived in separate systems. Now they sit one hop apart.

Why a graph instead of rows in a table?

Because the dangerous questions are about connections: does this drug collide with that lab result, given this diagnosis? Tables make you hunt across systems; a graph lets you walk straight from patient to risk in a few hops. The unsafe-Metformin link becomes a path you can literally trace.

Maria: for the first time, her low eGFR and her Metformin sit one hop apart in the same structure. The collision is now visible to software, not just to a careful human.

The semantic layer

The graph is powerful but technical. The semantic layer is the single governed surface on top of it, one place every application and AI assistant asks questions and gets the same, source-cited answer.

Goes in

The knowledge graph from Layer 5.

Comes out

A published query surface (the app's Surface / Playground), natural-language and structured Q&A, every answer cited.

# a real cited Q&A from our diabetes run (GraphRAG) Q: "Which diabetes medications need caution when kidney function is low (eGFR < 30)?" A: Metformin, use with caution / avoid below eGFR 30. confidence: medium cited: 07_niddk_diabetes_management · 03_fhir_r4_observation

One layer, four consumers, one cited answer each

Clinician, coder, billing, and quality each ask the same governed surface. The graph below shows Maria's case; every answer traces back to a source.

Is her Metformin safe? Correct diagnosis code? Why deny the claim? Counts in quality measure?

Is Maria's Metformin safe to continue?

No. Her eGFR is 28, below the 30 threshold. The layer flags Metformin for reassessment before it can cause harm.

cited: 07_niddk_diabetes_management · 03_fhir_r4_observation

What is the correct diagnosis code?

E11.22, Type 2 diabetes with diabetic chronic kidney disease, plus the CKD stage. Not the vague E11.9.

cited: 01_icd10cm_guidelines · ICD-10 ⇄ SNOMED crosswalk

Why might Maria's claim be denied?

Coded as the vague E11.9 it omits her CKD, under-stating severity and forfeiting risk-adjustment revenue. E11.22 plus stage is defensible.

cited: 01_icd10cm_guidelines · ICD-10 ⇄ SNOMED crosswalk

Does Maria count in the poor-control measure?

Yes. Her HbA1c is 9.4%, above the 9% poor-control threshold, so she is included automatically, no manual chart review.

cited: 08_cdc_diabetes_treatment · HbA1c poor-control measure

Why one shared layer instead of letting each app figure it out?

Because "what counts as poor control" or "which code is right" must mean the same thing in the EHR, the billing tool, the quality report, and the AI copilot. Define it once, govern it once, and every consumer inherits the same trusted meaning, with a citation, so no one has to take the answer on faith.

Maria: her care app, her clinic's coder, and the quality dashboard can now all ask about her case and get one consistent, evidence-backed answer, instead of three different guesses.

Application / use-case, the payoff

Seven layers up from a folder of PDFs, the four problems we started with are now closed, for Maria, automatically, with the reasoning shown.

● Care, now safe

An alert fires: eGFR 28 < 30, reassess Metformin. The drug-lab collision was waiting in the graph; the semantic layer surfaced it before harm.

● Coding, now precise

The coder is guided from the vague E11.9 to the correct E11.22 + CKD stage, driven by the verified crosswalk, not memory.

● Billing, now defensible

The claim reflects Maria's true severity: fewer denials, the risk-adjustment revenue her care actually warrants, and an audit trail back to the source.

● Quality, now reportable

Because "diabetic," "HbA1c 9.4%," and "poor control" share meaning, Maria is counted correctly in the population measure, no manual chart review.

Same engine, same seven layers. Point it at a different corpus and adapter and it does this for another domain. We've already proven the pipeline cross-domain on cyber insurance (F1 0.625 with a frontier model).

Where we honestly are

Real numbers, measured, not promisedhonesty over polish

This is preliminary data from the healthcare build (NLM SBIR Phase I). We report what we measured, including where we're below a funded target and why.

What we measured	Result	Read
Concept extraction (diabetes, local model)	P 47% · R 75% · F1 58%	recall strong; precision is what Phase I funds
Golden crosswalk pairs (ICD-10 ⇄ SNOMED)	50 pairs · 46 confirmed	draft pending final SME sign-off
Confidence calibration (does the score track expert judgment?)	ρ = 0.695	significant; human study is the Phase I target
Adapter lift (tuned vs untuned)	+18.2 pts F1	above the ≥15% target
Cross-domain proof (cyber insurance, frontier model)	F1 = 0.625	same pipeline, funded-tier accuracy

Calibration ratings here are an LLM-synthetic pilot proxy, explicitly labeled; the human clinical rater study is a Phase I deliverable. Every artifact (OWL · SHACL · SKOS · JSON-LD · Cypher) is W3C-validated on every run.

The whole story in one breath

Documents → meaning → trust → answers

We read the authoritative documents (1), pull out concepts with citations (2), connect them into an ontology of rules and cross-language bridges (3), have experts certify it into a governed golden standard (4), lay real patient data onto it as a knowledge graph (5), expose it as one governed semantic layer everything queries (6), so the right, safe, cited answer reaches the point of use (7). For Maria, that's the difference between a missed kidney risk and a caught one.