RETRIEVAL TUTORIAL // ORGANIZED AI

Orectic vs Penumbra

A short guide to vector stores, semantic search, and knowledge graphs — followed by a worked example showing how the same query travels through pure vector retrieval, pure graph traversal, and hybrid (GraphRAG). Then how Orectic and Penumbra each wire it up.

3 retrieval styles
1 worked example
2 products compared
Acme renewal query
// CORE IDEA

Three things people keep confusing

Vector stores, semantic search, and knowledge graphs are not interchangeable. They do different jobs. The interesting AI products of 2026 — Orectic and Penumbra included — combine all three, but they disagree on where the schema comes from. This guide gets you the vocabulary first, then walks one real query through each approach.

3
building blocks
1
query, three answers
2
products, opposite bets
                  ORECTIC VS PENUMBRA — HOW THIS HUB IS LAID OUT

  ┌──────────────────────────────────────────────────────────────────┐
  │  GUIDE  (you are here)                                          │
  │  ─────                                                           │
  │  vocab → combining → worked example → synthesis                  │
  └──────────────────────────────────────────────────────────────────┘
            │              │              │
            ▼              ▼              ▼
  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
  │  ARCH        │  │  WIKI        │  │  SOURCE      │
  │  ────        │  │  ────        │  │  ──────      │
  │ architecture │  │  A-Z terms   │  │ primary refs │
  │ side-by-side │  │  with refs   │  │ deeper reads │
  └──────────────┘  └──────────────┘  └──────────────┘

Explore the guide

// VOCABULARY

Building blocks

Three primitives. They do different things and they fail in different ways. Treat each one separately before combining them.

Vector store

Holds text as numerical coordinates in a high-dimensional space. Similar meanings end up near each other automatically — puppy sits next to dog, far from car. Knows similarity. Knows nothing else.

Semantic search

The action of querying a vector store. Your phrase becomes a vector, the store returns the nearest neighbors. The vector store is the thing; semantic search is what you do with it.

Knowledge graph

The opposite shape: explicit named nodes (Customer, Order, Product) connected by typed edges (placed, contains, shipped_to). Doesn't know similarity. Knows precise relationships.

        VECTOR STORE                 SEMANTIC SEARCH              KNOWLEDGE GRAPH
        ────────────                 ───────────────              ───────────────

     ·  ·                          ·  ·                       ┌─────────┐
   ·    ·  pets                ────▶ ·Customer·  ·                       query ──▶ ·             │  "Acme" │
                                       └──▶ ·             └────┬────┘
            ·  ·                                              │ placed
          ·    ·  vehicles       fan to nearest               ▼
            ·  ·                  neighbours              ┌─────────┐
                                                              │ Renewal │
                                                              │ "Q3 '25"│
     similarity in space            find by meaning           └─────────┘
                                                                   │ decided
                                                                   ▼
                                                              ┌──────────┐
                                                              │Decision  │
                                                              │DISC-2156 │
                                                              └──────────┘
// COMPOSITION

How they combine — RAG and GraphRAG

In most AI stacks, all three combine into a single pipeline. The shape of that pipeline has two names depending on whether a graph is involved.

RAG (retrieval-augmented generation)

Standard vector-only pattern.

  ingest ─▶ chunk ─▶ embed ─▶ vector store
                                       │
            query ─▶ embed ─▶ top-k  ─▶ chunks ─▶ LLM ─▶ answer

GraphRAG (graph + vector)

Adds a typed knowledge graph as a second retriever. The vector store says "these chunks look relevant". The graph says "and here are the exact entities, owners, dependencies, and prior decisions those chunks reference." Both feed the LLM together.

                              ┌───── vector store ───── chunks ─────┐
                              │                                      ▼
  ingest ─▶ extract ─┤                                LLM ─▶ answer
                              │                                      ▲
                              └───── knowledge graph ── entities ──┘
                                          ▲
                                     schema / ontology

Knowledge graphs require a schema — what counts as a Customer, Order, Product. Where that schema comes from is exactly where Orectic and Penumbra diverge. Orectic mines it from your files automatically. Penumbra has your team declare it. Same architecture, opposite bet.

// WORKED EXAMPLE

One query, three retrievals

The query: "Why did we discount Acme's renewal last quarter?" — chosen because it has hidden structure. A specific decision, a specific approver, a specific reason. Pure text similarity can only stumble onto these. A graph can walk to them.

Approach 1: Pure vector search

Query gets converted to a vector. Store returns top-k chunks whose embeddings are closest. LLM reads them and assembles an answer.

  "Why did we discount Acme's renewal last quarter?"embed
                          ▼
  ┌────────────────────────────────────────────────────────────┐
  │  VECTOR STORE                                              │
  │                                                            │
  │    ·    ·       ●─ ─ ─ ─ ●       ·                         │
  │       ·       ●                                           │
  │    ·              query  ●           ·                     │
  │                                                            │
  │  faded · = noise        ● = top-3 matches                  │
  └────────────────────────────────────────────────────────────┘
                          │
                     top-3 chunks
                          ▼
   ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────┐
   │ Acme MSA §4.2     │  │ Slack #acme-deal  │  │ Discount policy   │
   │ pricing tiers     │  │ "can we do 15%?"  │  │ general rules     │
   └───────────────────┘  └───────────────────┘  └───────────────────┘

  result: useful context, no canonical decision — LLM has to guess

The answer: "Based on the Acme MSA and internal discussions, it appears Acme requested a 15% discount, which fell within the discount policy's allowable range."

Why it's weak: Plausible, but invented. The actual decision record never appeared in the top-k because its embedding doesn't look much like the question. Can't tell you who approved, when, or why.

Approach 2: Pure graph traversal

Query gets parsed into a structured graph query. Traversal walks the typed edges and returns the canonical record.

  "Why did we discount Acme's renewal last quarter?"parse to graph query
                          ▼
  ┌──────────┐  has   ┌──────────┐ decided ┌──────────┐
  │ Customer │ ─────▶ │ Renewal  │ ──────▶ │ Decision │
  │  Acme    │        │ Q3 2025  │         │DISC-2156 │
  └──────────┘        └──────────┘         └─────┬────┘
                                                 │
                                                 ▼
  ┌────────────────────────────────────────────────────────────┐
  │  DISC-2156 — typed record                                 │
  │                                                            │
  │  approver:  VP Sales (Marta L.)                            │
  │  reason:    competitive bid from Rival Inc., retention     │
  │  terms:     15% off, capped 12 months, 2025-08-14          │
  └────────────────────────────────────────────────────────────┘

  result: precise, traceable — IF the data is already in the graph

The answer: "VP Sales Marta L. approved a 15% discount on Acme's Q3 2025 renewal on 2025-08-14, citing a competitive bid from Rival Inc."

Why it's strong: Exact. Cite-able. No hallucination possible because the LLM is reading typed fields, not paraphrasing prose. Why it can fail: If DISC-2156 was never extracted into the graph — say, the approval happened over Slack and nobody wrote it up — graph traversal returns nothing.

Approach 3: Hybrid (GraphRAG)

Run both retrievers in parallel. Graph anchors the canonical entity. Vector store, filtered by graph neighborhood, pulls surrounding unstructured context. LLM gets both and synthesizes with citations.

                "Why did we discount Acme's renewal last quarter?"
                                    │
                ┌───────────────────┴───────────────────┐
                ▼                                       ▼
  ┌─────────────────────────┐             ┌─────────────────────────┐
  │ GRAPH                   │             │ VECTOR                  │
  │ anchor the entity       │             │ pull surrounding context│
  │                         │             │                         │
  │ Acme → Renewal Q3       │             │ chunks linked to        │
  │      → DISC-2156        │             │ DISC-2156:              │
  │                         │             │ emails, Slack, calls    │
  │ (typed fields)          │             │ (unstructured texture)  │
  └────────────┬────────────┘             └────────────┬────────────┘
               │                                       │
               └───────────────────┬───────────────────┘
                                   ▼
                  ┌────────────────────────────────────┐
                  │ CONTEXT ASSEMBLY                   │
                  │ typed record + chunks + provenance │
                  └────────────────┬───────────────────┘
                                   │
                                LLM
                                   ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │ FINAL ANSWER                                                     │
  │                                                                  │
  │ Marta L. (VP Sales) approved 15% off on 2025-08-14 because a     │
  │ competitive bid from Rival Inc. put the renewal at risk. The     │
  │ CFO pushed back on margin in Slack but Marta cited retention     │
  │ priority. Sources: DISC-2156, #acme-deal.                        │
  └──────────────────────────────────────────────────────────────────┘

The answer is grounded AND has texture: approver name, exact date, reason — plus the CFO's pushback that explains why this was a hard call. Citations point to a typed record (provenance) plus the conversational evidence.

Side-by-side

What it returnsPure vectorPure graphHybrid
Canonical decision✗ missed✓ exact✓ exact
Approver name✗ unknown✓ Marta L.✓ Marta L.
Reason✗ guessed✓ from field✓ with texture
Conversational context✓ partial✗ none✓ included
Citations✗ approximate✓ typed ref✓ both kinds
Fails whencanonical record's embedding doesn't match querydata was never extracted into the graphgraph or vectors fail to populate
// PRODUCTS

Where Orectic and Penumbra land

Both products are hybrid systems. They just disagree on where the schema comes from. Same destination, opposite starting line.

Orectic — automated schema

Extraction engine reads your 17 source types (calls, docs, video, contracts) and infers a knowledge graph automatically. Their "748 relationships from a single client" pitch is that graph. Vector store sits alongside for fuzzy lookups. An Oracle agent wraps both. Strong if your truth lives in files you already have.

Penumbra — declared schema

Your team writes the ontology in plain language: objects, rules, workflows, standards. Penumbra emits the typed scaffolding — agent tools, APIs, memory, guardrails, provenance — so your own agents act on real business nouns. Strong if your value lives in tacit expert judgment.

Orectic
Bottom-up: machine-extract the schema from the mess you have. Ships an Oracle agent.
Penumbra
Top-down: humans declare the schema, machine generates the substrate. Ships a domain model your agents consume.
Mature systems
Will probably do both — declared structure where experts know the domain, extraction to fill in the rest from documents.

Deeper architecture comparison on arch. Term definitions on wiki. Primary sources on source.

// IMPLEMENTATION

Stack & conventions

This site is the single-file HTML pattern from the organized-ai-project-guide skill — terminal-dark theme, monospace topbar, sticky sidebar, ASCII diagrams with colored highlight spans.

Organized AI Cloudflare Pages wrangler 4.55 single-file HTML no JS framework

Theme

Dark terminal — yellow primary, teal AI, amber data, green output, purple routing.

Layout

1140px wrap, 220px sticky sidebar, 48px sticky topbar. Responsive collapse at 800px.

Code

<pre> for terminal, <pre class="ascii"> for diagrams with .hi .ht .ha .hg .hp spans.

// DEPLOY & RUN

Deploy & run

The same pattern deployed this page. Replace project name to deploy a sibling.

# idempotent create
CLOUDFLARE_ACCOUNT_ID=691fe25d377abac03627d6a88d3eeac9 \
  wrangler pages project create orectic-penumbra-guide \
  --production-branch main 2>/dev/null || true

# deploy
cd docs/guide
CLOUDFLARE_ACCOUNT_ID=691fe25d377abac03627d6a88d3eeac9 \
  wrangler pages deploy . \
  --project-name orectic-penumbra-guide \
  --branch main \
  --commit-dirty=true