Orectic vs Penumbra
A short guide to vector stores, semantic search, and knowledge graphs — followed by a worked example showing how the same query travels through pure vector retrieval, pure graph traversal, and hybrid (GraphRAG). Then how Orectic and Penumbra each wire it up.
Three things people keep confusing
Vector stores, semantic search, and knowledge graphs are not interchangeable. They do different jobs. The interesting AI products of 2026 — Orectic and Penumbra included — combine all three, but they disagree on where the schema comes from. This guide gets you the vocabulary first, then walks one real query through each approach.
ORECTIC VS PENUMBRA — HOW THIS HUB IS LAID OUT
┌──────────────────────────────────────────────────────────────────┐
│ GUIDE (you are here) │
│ ───── │
│ vocab → combining → worked example → synthesis │
└──────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ ARCH │ │ WIKI │ │ SOURCE │
│ ──── │ │ ──── │ │ ────── │
│ architecture │ │ A-Z terms │ │ primary refs │
│ side-by-side │ │ with refs │ │ deeper reads │
└──────────────┘ └──────────────┘ └──────────────┘
Explore the guide
Building blocks
Three primitives. They do different things and they fail in different ways. Treat each one separately before combining them.
Vector store
Holds text as numerical coordinates in a high-dimensional space. Similar meanings end up near each other automatically — puppy sits next to dog, far from car. Knows similarity. Knows nothing else.
Semantic search
The action of querying a vector store. Your phrase becomes a vector, the store returns the nearest neighbors. The vector store is the thing; semantic search is what you do with it.
Knowledge graph
The opposite shape: explicit named nodes (Customer, Order, Product) connected by typed edges (placed, contains, shipped_to). Doesn't know similarity. Knows precise relationships.
VECTOR STORE SEMANTIC SEARCH KNOWLEDGE GRAPH
──────────── ─────────────── ───────────────
· · · · ┌─────────┐
· · pets ● ────▶ · │ Customer│
· · query ──▶ · │ "Acme" │
└──▶ · └────┬────┘
· · │ placed
· · vehicles fan to nearest ▼
· · neighbours ┌─────────┐
│ Renewal │
│ "Q3 '25"│
similarity in space find by meaning └─────────┘
│ decided
▼
┌──────────┐
│Decision │
│DISC-2156 │
└──────────┘
How they combine — RAG and GraphRAG
In most AI stacks, all three combine into a single pipeline. The shape of that pipeline has two names depending on whether a graph is involved.
RAG (retrieval-augmented generation)
Standard vector-only pattern.
ingest ─▶ chunk ─▶ embed ─▶ vector store │ query ─▶ embed ─▶ top-k ─▶ chunks ─▶ LLM ─▶ answer
GraphRAG (graph + vector)
Adds a typed knowledge graph as a second retriever. The vector store says "these chunks look relevant". The graph says "and here are the exact entities, owners, dependencies, and prior decisions those chunks reference." Both feed the LLM together.
┌───── vector store ───── chunks ─────┐
│ ▼
ingest ─▶ extract ─┤ LLM ─▶ answer
│ ▲
└───── knowledge graph ── entities ──┘
▲
schema / ontology
Knowledge graphs require a schema — what counts as a Customer, Order, Product. Where that schema comes from is exactly where Orectic and Penumbra diverge. Orectic mines it from your files automatically. Penumbra has your team declare it. Same architecture, opposite bet.
One query, three retrievals
The query: "Why did we discount Acme's renewal last quarter?" — chosen because it has hidden structure. A specific decision, a specific approver, a specific reason. Pure text similarity can only stumble onto these. A graph can walk to them.
Approach 1: Pure vector search
Query gets converted to a vector. Store returns top-k chunks whose embeddings are closest. LLM reads them and assembles an answer.
"Why did we discount Acme's renewal last quarter?" │ embed ▼ ┌────────────────────────────────────────────────────────────┐ │ VECTOR STORE │ │ │ │ · · ●─ ─ ─ ─ ● · │ │ · ● ◉ │ │ · query ● · │ │ │ │ faded · = noise ● = top-3 matches │ └────────────────────────────────────────────────────────────┘ │ top-3 chunks ▼ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ Acme MSA §4.2 │ │ Slack #acme-deal │ │ Discount policy │ │ pricing tiers │ │ "can we do 15%?" │ │ general rules │ └───────────────────┘ └───────────────────┘ └───────────────────┘ result: useful context, no canonical decision — LLM has to guess
The answer: "Based on the Acme MSA and internal discussions, it appears Acme requested a 15% discount, which fell within the discount policy's allowable range."
Why it's weak: Plausible, but invented. The actual decision record never appeared in the top-k because its embedding doesn't look much like the question. Can't tell you who approved, when, or why.
Approach 2: Pure graph traversal
Query gets parsed into a structured graph query. Traversal walks the typed edges and returns the canonical record.
"Why did we discount Acme's renewal last quarter?" │ parse to graph query ▼ ┌──────────┐ has ┌──────────┐ decided ┌──────────┐ │ Customer │ ─────▶ │ Renewal │ ──────▶ │ Decision │ │ Acme │ │ Q3 2025 │ │DISC-2156 │ └──────────┘ └──────────┘ └─────┬────┘ │ ▼ ┌────────────────────────────────────────────────────────────┐ │ DISC-2156 — typed record │ │ │ │ approver: VP Sales (Marta L.) │ │ reason: competitive bid from Rival Inc., retention │ │ terms: 15% off, capped 12 months, 2025-08-14 │ └────────────────────────────────────────────────────────────┘ result: precise, traceable — IF the data is already in the graph
The answer: "VP Sales Marta L. approved a 15% discount on Acme's Q3 2025 renewal on 2025-08-14, citing a competitive bid from Rival Inc."
Why it's strong: Exact. Cite-able. No hallucination possible because the LLM is reading typed fields, not paraphrasing prose. Why it can fail: If DISC-2156 was never extracted into the graph — say, the approval happened over Slack and nobody wrote it up — graph traversal returns nothing.
Approach 3: Hybrid (GraphRAG)
Run both retrievers in parallel. Graph anchors the canonical entity. Vector store, filtered by graph neighborhood, pulls surrounding unstructured context. LLM gets both and synthesizes with citations.
"Why did we discount Acme's renewal last quarter?"
│
┌───────────────────┴───────────────────┐
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ GRAPH │ │ VECTOR │
│ anchor the entity │ │ pull surrounding context│
│ │ │ │
│ Acme → Renewal Q3 │ │ chunks linked to │
│ → DISC-2156 │ │ DISC-2156: │
│ │ │ emails, Slack, calls │
│ (typed fields) │ │ (unstructured texture) │
└────────────┬────────────┘ └────────────┬────────────┘
│ │
└───────────────────┬───────────────────┘
▼
┌────────────────────────────────────┐
│ CONTEXT ASSEMBLY │
│ typed record + chunks + provenance │
└────────────────┬───────────────────┘
│
LLM
▼
┌──────────────────────────────────────────────────────────────────┐
│ FINAL ANSWER │
│ │
│ Marta L. (VP Sales) approved 15% off on 2025-08-14 because a │
│ competitive bid from Rival Inc. put the renewal at risk. The │
│ CFO pushed back on margin in Slack but Marta cited retention │
│ priority. Sources: DISC-2156, #acme-deal. │
└──────────────────────────────────────────────────────────────────┘
The answer is grounded AND has texture: approver name, exact date, reason — plus the CFO's pushback that explains why this was a hard call. Citations point to a typed record (provenance) plus the conversational evidence.
Side-by-side
| What it returns | Pure vector | Pure graph | Hybrid |
|---|---|---|---|
| Canonical decision | ✗ missed | ✓ exact | ✓ exact |
| Approver name | ✗ unknown | ✓ Marta L. | ✓ Marta L. |
| Reason | ✗ guessed | ✓ from field | ✓ with texture |
| Conversational context | ✓ partial | ✗ none | ✓ included |
| Citations | ✗ approximate | ✓ typed ref | ✓ both kinds |
| Fails when | canonical record's embedding doesn't match query | data was never extracted into the graph | graph or vectors fail to populate |
Where Orectic and Penumbra land
Both products are hybrid systems. They just disagree on where the schema comes from. Same destination, opposite starting line.
Orectic — automated schema
Extraction engine reads your 17 source types (calls, docs, video, contracts) and infers a knowledge graph automatically. Their "748 relationships from a single client" pitch is that graph. Vector store sits alongside for fuzzy lookups. An Oracle agent wraps both. Strong if your truth lives in files you already have.
Penumbra — declared schema
Your team writes the ontology in plain language: objects, rules, workflows, standards. Penumbra emits the typed scaffolding — agent tools, APIs, memory, guardrails, provenance — so your own agents act on real business nouns. Strong if your value lives in tacit expert judgment.
Deeper architecture comparison on arch. Term definitions on wiki. Primary sources on source.
Stack & conventions
This site is the single-file HTML pattern from the organized-ai-project-guide skill — terminal-dark theme, monospace topbar, sticky sidebar, ASCII diagrams with colored highlight spans.
Theme
Dark terminal — yellow primary, teal AI, amber data, green output, purple routing.
Layout
1140px wrap, 220px sticky sidebar, 48px sticky topbar. Responsive collapse at 800px.
Code
<pre> for terminal, <pre class="ascii"> for diagrams with .hi .ht .ha .hg .hp spans.
Deploy & run
The same pattern deployed this page. Replace project name to deploy a sibling.
# idempotent create CLOUDFLARE_ACCOUNT_ID=691fe25d377abac03627d6a88d3eeac9 \ wrangler pages project create orectic-penumbra-guide \ --production-branch main 2>/dev/null || true # deploy cd docs/guide CLOUDFLARE_ACCOUNT_ID=691fe25d377abac03627d6a88d3eeac9 \ wrangler pages deploy . \ --project-name orectic-penumbra-guide \ --branch main \ --commit-dirty=true