From a stack of PDFs to a safe answer at the bedside
Healthcare knows how to treat diabetes. The problem is that the knowledge is trapped in
documents and scattered across systems that don't speak the same language. This is the story of how we
turn that mess into one governed map of meaning, told through a single patient.
jagat.ai · strategic foresight & applied AI architecture
The problem, in plain words
Healthcare runs on four languages that don't talk to each other
A single diabetic patient is described, simultaneously, in four incompatible vocabularies,
plus a pile of guideline PDFs no software can read. A fact obvious to a doctor becomes invisible to the
systems that bill, report, and assist.
SNOMED CTwhat the doctor's EHR records
44054006 = "Diabetes mellitus type 2"
ICD-10-CMwhat the biller must submit
E11.22 = "T2DM w/ diabetic CKD"
LOINCwhat the lab reports
4548-4 = "HbA1c" · eGFR result
RxNormwhat the pharmacy dispenses
Metformin (caution if eGFR<30)
And the rules that connect them, like "if Type 2 diabetes has damaged
the kidneys, code it as E11.22 and add the CKD stage; don't give full-dose Metformin below eGFR 30,"
live only in PDF guidelines. Humans read them. Machines can't.
What that costs, the problem catalog
Four kinds of failure, one root cause
When meaning isn't connected, the same gap shows up in four places. These are the problems
OntoGen is built to close.
● Care
Unsafe & missed care
A drug-safety rule (don't dose Metformin at low kidney function) never fires because the lab result
and the prescription live in unconnected systems.
● Coding
Under-specified codes
The EHR says "type 2 diabetes." The biller submits the vague E11.9 instead
of the precise E11.22 plus CKD stage, because nothing maps one to the other.
● Billing
Denials & lost revenue
Vague codes get claims denied and forfeit risk-adjustment revenue the patient's true severity
justifies. Money and accuracy both leak.
● Quality
Unreportable measures
Quality programs ask "what % of diabetics have poor HbA1c control?" You can't compute it if "diabetic,"
"HbA1c," and "poor control" aren't linked by shared meaning.
One patient, all four problems
Meet Maria
🩺
Maria, 61. Type 2 diabetes for 12 years. Her latest labs
show HbA1c 9.4% (poorly controlled) and a falling eGFR of 28. Her kidneys are now damaged.
Her chart still carries the diagnosis from years ago: plain "type 2 diabetes." Her current Metformin
prescription was never re-checked against her kidneys.
care: Metformin unsafe at eGFR<30coding: chart says E11.9, truth is E11.22billing: claim under-codedquality: HbA1c 9.4% > 9% poor-control
Every layer that follows moves Maria's case one step closer to a single,
safe, correctly-coded, reportable answer. Watch the amber notes.
The spine of the whole system
Seven layers: raw documents to real value
Each layer takes the one below it and adds structure a machine can use. The bottom is a folder
of files; the top is the right answer in a clinician's hands. The labels on the right show where this happens
in the OntoGen app.
7
Application / use-case
The safe, coded, reportable answer, at the point of care, in the biller's tool, in the quality report.
App stage · Operate
6
Semantic layer
One governed "ask-anything" surface every app and AI queries, so the answer is the same everywhere.
Operate · Surface & Playground
5
Knowledge graph
The map plus real patient data, loaded as a network you can walk: patient → diagnosis → lab → drug.
Operate
4
Governed golden standard
The map, verified by human experts and frozen as a versioned, trustworthy source of truth.
Ground & Localize · Gold Builder + SME review
3
Relationships to ontology
How the concepts connect, plus rules and cross-language bridges, the actual map of meaning.
Ground · 5-pass extraction
2
Extracted concepts
The important "things" pulled out of the documents, each with a source citation and a confidence score.
Ground · 5-pass extraction
1
Source documents
The raw, human-written truth: coding guidelines, data-exchange specs, clinical pages.
Setup · Standards Picker
1
Source documents
Everything starts with the documents humans already trust. We don't invent knowledge, we read
the authoritative sources.
Goes in
Public, authoritative diabetes sources, chosen in the app's Standards Picker.
Comes out
A frozen, licensed, checksum-verified corpus the engine can read.
# the real diabetes corpus we used
ICD-10-CM Official Guidelines FY2026 → the coding rules
FHIR R4: Condition, Observation,
MedicationRequest, CarePlan,
Coverage → how systems exchange data
NIDDK "Managing Diabetes" (NIH) → clinical management
CDC "Manage Blood Sugar" → targets & monitoring
SNOMED CT · LOINC · RxNorm (licensed, kept local, the vocabularies)
Why standards, not just any text?
Because a map is only as trustworthy as its sources. Grounding in ICD-10, FHIR and SNOMED means
the result speaks the languages the rest of healthcare already uses, instead of inventing a private one.
Maria: the very rules her case needs, "T2DM with CKD → E11.22, add the stage" and
"Metformin caution below eGFR 30," enter the system here, still locked inside prose.
2
Extracted concepts
The first pass of the engine reads the corpus and pulls out the things that matter, each
one tagged with where it came from and how confident the model is.
Goes in
The corpus from Layer 1.
Comes out
A list of concepts, each with a definition, a source citation, and a confidence score.
# real extracted concepts (from our run, lightly trimmed)
Blood Glucose source: 08_cdc_diabetes_treatment confidence: high
Continuous Glucose
Monitor (CGM) source: 07_niddk_diabetes_mgmt confidence: high
Diabetes Care Plan source: 05_fhir_r4_careplan confidence: medium
Allergy source: 03_fhir_r4_observation
rationale: "explicit seed concept mention"
Why a confidence score and a citation on every concept?
So nothing is a black box. Every claim the system makes can be traced back to a real document,
and low-confidence items can be sent to a human instead of trusted blindly. Honesty is built into the data,
not bolted on.
Maria: "Blood Glucose," "HbA1c," "kidney function," and "Metformin" now exist as
first-class concepts the machine can reason about, no longer just words on a CDC page.
3
Relationships to the ontology
Concepts alone are a glossary. The next passes connect them, into a hierarchy, into rules about
what's required, and into cross-language bridges. That connected whole is the ontology: the actual
map of meaning.
Goes in
The concepts from Layer 2.
Comes out
Standards-format files: OWL (the map), SKOS (the hierarchy), SHACL (the rules), and crosswalks (the bridges).
# a rule the engine wrote (SHACL): every Artificial Pancreas MUST treat something
ArtificialPancreasShape sh:targetClass :Artificial_Pancreas ;
sh:property [ sh:path :treats ; sh:minCount 1 ] .
# a cross-language bridge (the crosswalk): same idea, two vocabularies
ICD-10 E11.9 ⇄ SNOMED 44054006 "Diabetes mellitus type 2" match: exact
ICD-10 E11.9 ⇄ SNOMED 73211009 "Diabetes mellitus" match: narrower
# SNOMED term is broader than the ICD code
Why bother turning prose into OWL/SHACL files?
Because these are W3C standards, the universal formats for machine-readable meaning. Write
the map once in these formats and it loads into any graph database, any reasoning tool, any AI, instead of
being locked to one vendor.
Maria: the system now knows diabetes can have a kidney complication, that the
complication changes the code, and that her EHR's SNOMED term maps to an ICD-10 code. The rules are finally
machine-readable, but not yet trusted.
4
The governed golden standardwhere humans decide
An AI-drafted map is a strong start, not a source of truth. Layer 4 is where a draft becomes
something an organization will stake decisions on, through expert review and a frozen, versioned release.
Goes in
The AI-drafted ontology plus crosswalks from Layer 3, assembled in the Gold Builder.
Comes out
A signed-off, versioned standard (e.g. v1.0), the trusted source of truth.
The Subject-Matter Expert's role
A diabetes coding specialist or clinician reviews
the draft one change at a time in the app's diff-review queue. They confirm, correct, or
reject each proposal:
draft pair: E11.9 ⇄ SNOMED 73211009 "Diabetes mellitus" match: narrower
SME ruling: ✔ confirmed # "correct, SNOMED is broader; ICD defaults unspecified DM to E11.9"
draft pair: E10.x mirror proposed by the agent (type-1 variant)
SME ruling: ✎ needs review # 4 agent-proposed mirrors held for the expert to resolve
Why is the human step non-negotiable?
Because in healthcare a wrong mapping can mean a denied claim or an unsafe order. The SME is the
accountable authority who turns a plausible AI draft into a defensible standard. OntoGen's job is to do 90% of
the labor and bring the expert only the decisions that need judgment, and to record who decided what, when.
The autonomy dial automates construction, never governance.
Honest status today
Our diabetes golden crosswalk is 50 expert-curated pairs, 46 confirmed directly against the
official NLM SNOMED→ICD map, 4 held for SME sign-off, labeled draft-pending-sme-signoff.
We never present a draft as final.
Maria: a human expert has now certified that her chart's SNOMED diabetes term maps
to the right ICD-10 family, and that the CKD-complication rule is correct. Her case now rests on verified meaning.
5
The knowledge graph
The governed standard is the map. Now we lay real patient data onto it. The result is
a network you can walk, where Maria, her diagnosis, her labs and her meds are all connected nodes.
Goes in
The golden standard (Layer 4) plus actual records (a patient's diagnoses, labs, prescriptions).
Comes out
A queryable graph (loaded via Neo4j Cypher), meaning plus data, walkable as relationships.
Maria's record laid onto the governed map. The dashed edge is the drug-lab collision that
was invisible while the lab and the prescription lived in separate systems. Now they sit one hop apart.
Why a graph instead of rows in a table?
Because the dangerous questions are about connections: does this drug collide with that lab
result, given this diagnosis? Tables make you hunt across systems; a graph lets you walk straight from patient
to risk in a few hops. The unsafe-Metformin link becomes a path you can literally trace.
Maria: for the first time, her low eGFR and her Metformin sit one hop apart in the
same structure. The collision is now visible to software, not just to a careful human.
6
The semantic layer
The graph is powerful but technical. The semantic layer is the single governed surface on top
of it, one place every application and AI assistant asks questions and gets the same, source-cited answer.
Goes in
The knowledge graph from Layer 5.
Comes out
A published query surface (the app's Surface / Playground), natural-language and structured Q&A, every answer cited.
# a real cited Q&A from our diabetes run (GraphRAG)
Q: "Which diabetes medications need caution when kidney function is low (eGFR < 30)?"
A: Metformin, use with caution / avoid below eGFR 30. confidence: medium
cited: 07_niddk_diabetes_management · 03_fhir_r4_observation
One layer, four consumers, one cited answer each
Clinician, coder, billing, and quality each ask
the same governed surface. The graph below shows Maria's case; every answer traces back to a source.
Is her Metformin safe?Correct diagnosis code?Why deny the claim?Counts in quality measure?
Is Maria's Metformin safe to continue?
No. Her eGFR is 28, below the 30 threshold. The layer flags Metformin for reassessment before it can cause harm.
Why one shared layer instead of letting each app figure it out?
Because "what counts as poor control" or "which code is right" must mean the same thing in the
EHR, the billing tool, the quality report, and the AI copilot. Define it once, govern it once, and every consumer
inherits the same trusted meaning, with a citation, so no one has to take the answer on faith.
Maria: her care app, her clinic's coder, and the quality dashboard can now all ask
about her case and get one consistent, evidence-backed answer, instead of three different guesses.
7
Application / use-case, the payoff
Seven layers up from a folder of PDFs, the four problems we started with are now closed, for Maria,
automatically, with the reasoning shown.
● Care, now safe
An alert fires: eGFR 28 < 30, reassess Metformin. The drug-lab
collision was waiting in the graph; the semantic layer surfaced it before harm.
● Coding, now precise
The coder is guided from the vague E11.9 to the
correct E11.22 + CKD stage, driven by the verified crosswalk, not memory.
● Billing, now defensible
The claim reflects Maria's true severity: fewer denials, the risk-adjustment
revenue her care actually warrants, and an audit trail back to the source.
● Quality, now reportable
Because "diabetic," "HbA1c 9.4%," and "poor control" share meaning, Maria is
counted correctly in the population measure, no manual chart review.
Same engine, same seven layers. Point it at a different corpus and adapter and
it does this for another domain. We've already proven the pipeline cross-domain on cyber insurance
(F1 0.625 with a frontier model).
Where we honestly are
Real numbers, measured, not promisedhonesty over polish
This is preliminary data from the healthcare build (NLM SBIR Phase I). We report what we measured,
including where we're below a funded target and why.
What we measured
Result
Read
Concept extraction (diabetes, local model)
P 47% · R 75% · F1 58%
recall strong; precision is what Phase I funds
Golden crosswalk pairs (ICD-10 ⇄ SNOMED)
50 pairs · 46 confirmed
draft pending final SME sign-off
Confidence calibration (does the score track expert judgment?)
Calibration ratings here are an LLM-synthetic pilot
proxy, explicitly labeled; the human clinical rater study is a Phase I deliverable. Every artifact
(OWL · SHACL · SKOS · JSON-LD · Cypher) is W3C-validated on every run.
The whole story in one breath
Documents → meaning → trust → answers
We read the authoritative documents (1), pull out concepts with citations (2), connect
them into an ontology of rules and cross-language bridges (3), have experts certify it into a governed
golden standard (4), lay real patient data onto it as a knowledge graph (5), expose it as one
governed semantic layer everything queries (6), so the right, safe, cited answer reaches the
point of use (7). For Maria, that's the difference between a missed kidney risk and a caught one.