A jagat.ai product · Applied AI architecture

Your organization's knowledge is trapped in documents. OntoGen turns it into meaning a machine can use.

A governed map of meaning from your industry documents — measured for accuracy, signed off by your experts, and traceable to the source line.

The problem

Every enterprise runs on vocabularies that don't talk to each other.

The same entity is described in incompatible systems, and the rules that connect them live only in PDFs no software can read. In healthcare, one diabetic patient is four different records at once.

SNOMED CT
the clinician's EHR
44054006 = "Diabetes mellitus type 2"
ICD-10-CM
the biller
E11.22 = "T2DM w/ diabetic CKD"
LOINC
the lab
4548-4 = "HbA1c" · eGFR
RxNorm
the pharmacy
Metformin · caution if eGFR<30

The result is miscoded claims, denied bills, missed care, and quality measures that cannot be computed. The same shape of problem shows up wherever meaning is trapped in documents — insurance policies, regulations, data dictionaries, product specs.

What OntoGen is

It reads the authoritative documents and produces one governed map of meaning.

A multi-tenant engine wrapped around a five-pass LLM extraction pipeline. It surrounds your systems of record — it reads, it never writes back — and emits portable, open artifacts that load into any graph or semantic platform.

Trust by construction

Measured, not asserted

Every run is scored for precision, recall, and F1 against a gold standard, with a quality gate in CI. The number is the differentiator — almost nobody else measures.

Provenance

Every term traces to a source line

Concepts, relationships, and answers each carry a citation and a confidence score. The cited GraphRAG layer turns "the model said so" into a defensible, auditable artifact.

Human-governed

The expert holds the pen

SMEs approve, flag, correct, and sign off — diff by diff, fully audited. The autonomy dial automates construction, never governance. The standard changes only on sign-off.

Portable output

Open artifacts, no lock-in

OWL, SHACL, SKOS, JSON-LD, and Neo4j Cypher — each W3C-validated on every run, loading into GraphDB, Stardog, Neptune, Collibra, or Neo4j.

Standards-grounded

Speaks the industry's language

Grounded in the standards your sector already uses — FIBO, NAICS, NIST, ISO-ACORD, ICD-10, FHIR — so the model is credible, not invented.

Domain-agnostic

One engine, many domains

Point it at a different corpus and adapter and it produces the same governed result. Proven cross-domain on insurance and healthcare with one pipeline.

How it works

A governed domain moves through four stages.

From a folder of documents to a published, queryable standard — with the human in control at every gate.

01
Org admin · architect

Setup

Choose the grounding standards. Open standards take the fast path; licensed terminologies are tracked in parallel and kept local under their license.

Standards Picker
02
Domain SME

Ground

Run the five-pass extraction — concepts, relationships, taxonomy, crosswalks, constraints — build the canonical model, and assemble the gold candidate.

Gold Builder
03
SME · architect

Localize

Run fit and gap against your own corpus, then SME diff-review and steward sign-off. The machine's draft becomes a trusted, versioned standard.

Fit / Gap · Diff Review
04
Answer consumers

Operate

Publish the consumer API and query the governed graph. Applications and AI assistants get the same source-cited answer, every time.

Surface · Playground
Where we honestly are

Real numbers, reported as measured.

Insurance is governed and measured with a frontier model. Healthcare is an active build toward an NLM SBIR Phase I application — preliminary numbers, labeled as preliminary.

Cyber insurance (P&C), governedF1 0.625
Frontier model, gold standard v1.0, five evaluation runs. Funded-tier accuracy, same pipeline.
Diabetes concept extractionP 47 · R 75 · F1 58
Local model, public corpus. Recall strong; precision is the Phase I deliverable.
Adapter lift, tuned vs untuned+18.2 pts F1
Above the ≥15% target. Frontier headline run pending spend approval.
ICD-10 ⇄ SNOMED crosswalk50 pairs · 46 confirmed
Confirmed against the official NLM map; 4 held for SME sign-off, labeled draft.
Confidence calibrationρ = 0.695
Synthetic-proxy pilot, explicitly labeled. The human clinical rater study is a Phase I deliverable.
Honesty is the brand

We report measured numbers as measured. Where a funded target is not yet met before award, we state the honest value and why it is a Phase I deliverable. Synthetic data is labeled as synthetic.

A modest-but-measured number with a roadmap beats an unmeasured claim that it is great. Every artifact OntoGen emits is validated against the W3C standard validators on every run.

Healthcare build supports an NLM SBIR Phase I application. No PHI anywhere; licensed terminologies stay local under the UMLS license.

Worked examples

The same engine, told through one case at a time.

Each story walks one real patient or one real policy through all seven layers, from a folder of PDFs to a safe, cited answer at the point of use.

Healthcare · NLM SBIR Phase I

The Diabetes Story

How a missed kidney risk becomes a caught one. One patient, four broken vocabularies, and the governed map that makes her care safe, her code precise, her claim defensible, and her quality measure computable.

Read the story
Insurance · governed, F1 0.625

The Cyber Story

How a carrier's policy wordings become a governed cyber ontology with NAICS↔ISO crosswalks and a cited coverage answer — in hours instead of the weeks a data-modeling team spends by hand.

In development

Engagements begin with a conversation.

OntoGen is a jagat.ai product, founder-led, with a network of consultants. If you are exploring a semantic-layer or knowledge-graph initiative, or evaluating an accelerator for a carrier engagement, get in touch.

hello@jagat.ai