fix(security): 添加VITE_PAYMENT_URL环境变量配置

2026-06-18 21:29:41 +08:00
parent 3d977d0a2d
commit 8afeb2e4d9
160 changed files with 21893 additions and 0 deletions
--- a/.qoder/skills/understand-knowledge/SKILL.md
+++ b/.qoder/skills/understand-knowledge/SKILL.md
@@ -0,0 +1,132 @@
+---
+name: understand-knowledge
+description: Analyze a Karpathy-pattern LLM wiki knowledge base and generate an interactive knowledge graph with entity extraction, implicit relationships, and topic clustering.
+argument-hint: [wiki-directory]
+---
+
+# /understand-knowledge
+
+Analyzes a Karpathy-pattern LLM wiki — a three-layer knowledge base with raw sources, wiki markdown, and a schema file — and produces an interactive knowledge graph dashboard.
+
+## What It Detects
+
+The **Karpathy LLM wiki pattern** (see https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):
+- **Raw sources** — immutable source documents (articles, papers, data files)
+- **Wiki** — LLM-generated markdown files with wikilinks (`[[target]]` syntax)
+- **Schema** — CLAUDE.md, AGENTS.md, or similar configuration file
+- **index.md** — content catalog organized by categories
+- **log.md** — chronological operation log
+
+Detection signals: has `index.md` + multiple `.md` files with wikilinks. May have `raw/` directory and schema file.
+
+## Instructions
+
+### Phase 1: DETECT
+
+1. Determine the target directory:
+   - If the user provided a path argument, use that
+   - Otherwise, use the current working directory
+
+2. Run the format detection script bundled with this skill:
+   ```
+   python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>
+   ```
+   - If the script exits with an error, tell the user this doesn't appear to be a Karpathy-pattern wiki and explain what was expected
+   - If successful, proceed. The script writes `scan-manifest.json` to `<TARGET_DIR>/.understand-anything/intermediate/`
+
+3. Read the scan-manifest.json and announce the results:
+   - "Detected Karpathy wiki: N articles, N sources, N topics, N wikilinks (N unresolved)"
+   - List the categories found from index.md
+
+### Phase 2: SCAN (already done)
+
+The parse script in Phase 1 already performed the deterministic scan. The scan-manifest.json contains:
+- Article nodes (one per wiki .md file) with extracted wikilinks, headings, frontmatter
+- Source nodes (one per raw/ file)
+- Topic nodes (from index.md section headings)
+- `related` edges (from wikilinks)
+- `categorized_under` edges (from index.md sections)
+
+No additional scanning is needed. Proceed to Phase 3.
+
+### Phase 3: ANALYZE
+
+Dispatch `article-analyzer` subagents to extract implicit knowledge:
+
+1. Read the scan-manifest.json to get the article list
+
+2. Prepare batches of 10-15 articles each, grouped by category when possible (articles in the same category are more likely to have implicit cross-references)
+
+3. For each batch, dispatch an `article-analyzer` subagent with:
+   - The batch of articles (id, name, summary, wikilinks, category, content from knowledgeMeta)
+   - The full list of existing node IDs (so the agent can reference them)
+   - The batch number for output file naming
+   - The intermediate directory path: `$INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate`
+   
+   The agent will write `analysis-batch-{N}.json` to the intermediate directory.
+
+4. Run up to 3 batches concurrently. Wait for all batches to complete.
+
+5. If any batch fails, log a warning but continue — the scan-manifest provides a solid base graph even without LLM analysis.
+
+### Phase 4: MERGE
+
+1. Run the merge script bundled with this skill:
+   ```
+   python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR>
+   ```
+
+2. The script:
+   - Combines scan-manifest.json + all analysis-batch-*.json files
+   - Deduplicates entities (case-insensitive name matching)
+   - Normalizes node/edge types via alias maps
+   - Builds layers from index.md categories
+   - Builds a tour from index.md section ordering
+   - Writes `assembled-graph.json` to the intermediate directory
+
+3. Read the merge report from stderr and announce:
+   - Total nodes, edges, layers, tour steps
+   - How many entities/claims the LLM analysis added
+
+### Phase 5: SAVE
+
+1. Read the assembled-graph.json
+
+2. Run basic validation:
+   - Every edge source/target must reference an existing node
+   - Every node must have: id, type, name, summary, tags, complexity
+   - Remove any edges with dangling references
+
+3. Copy the validated graph to `<TARGET_DIR>/.understand-anything/knowledge-graph.json`
+
+4. Write metadata to `<TARGET_DIR>/.understand-anything/meta.json`:
+   ```json
+   {
+     "lastAnalyzedAt": "<ISO timestamp>",
+     "gitCommitHash": "<from git rev-parse HEAD or empty>",
+     "version": "1.0.0",
+     "analyzedFiles": <number of wiki articles>
+   }
+   ```
+
+5. Clean up intermediate files:
+   ```
+   rm -rf <TARGET_DIR>/.understand-anything/intermediate
+   ```
+
+6. Report summary to the user:
+   - "Knowledge graph saved: N articles, N entities, N topics, N claims, N sources"
+   - "N edges (N wikilink, N categorized, N implicit)"
+   - "N layers, N tour steps"
+
+7. Auto-trigger the dashboard:
+   ```
+   /understand-dashboard <TARGET_DIR>
+   ```
+
+## Notes
+
+- The parse script handles ALL deterministic extraction (wikilinks, headings, frontmatter, categories from index.md). The LLM agents only add implicit knowledge that requires inference.
+- Categories and taxonomy come from index.md section headings, NOT from filename prefixes. The Karpathy spec is intentionally abstract about naming conventions.
+- The graph uses `kind: "knowledge"` to signal the dashboard to use force-directed layout instead of hierarchical dagre.
+- Source nodes from raw/ are lightweight (filename + size only) — we don't parse PDFs or binary files.
--- a/.qoder/skills/understand-knowledge/merge-knowledge-graph.py
+++ b/.qoder/skills/understand-knowledge/merge-knowledge-graph.py
@@ -0,0 +1,397 @@
+#!/usr/bin/env python3
+"""
+Merge script for Karpathy-pattern knowledge graphs.
+
+Combines the deterministic scan-manifest.json with LLM analysis batches
+(analysis-batch-*.json) into a final assembled knowledge graph.
+
+Handles: entity deduplication, edge normalization, layer building from
+index.md categories, tour generation from index.md section ordering.
+
+Usage:
+    python merge-knowledge-graph.py <wiki-directory>
+
+Output:
+    Writes assembled-graph.json to <wiki-directory>/.understand-anything/intermediate/
+"""
+
+import json
+import os
+import re
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+# ---------------------------------------------------------------------------
+# Canonical type sets (must match core/src/types.ts)
+# ---------------------------------------------------------------------------
+
+VALID_NODE_TYPES = {
+    "article", "entity", "topic", "claim", "source",
+    # Codebase types (for cross-compatibility)
+    "file", "function", "class", "module", "concept",
+    "config", "document", "service", "table", "endpoint",
+    "pipeline", "schema", "resource", "domain", "flow", "step",
+}
+
+VALID_EDGE_TYPES = {
+    "cites", "contradicts", "builds_on", "exemplifies",
+    "categorized_under", "authored_by", "related", "similar_to",
+    # Codebase types
+    "imports", "exports", "contains", "inherits", "implements",
+    "calls", "subscribes", "publishes", "middleware",
+    "reads_from", "writes_to", "transforms", "validates",
+    "depends_on", "tested_by", "configures",
+    "deploys", "serves", "provisions", "triggers",
+    "migrates", "documents", "routes", "defines_schema",
+    "contains_flow", "flow_step", "cross_domain",
+}
+
+NODE_TYPE_ALIASES = {
+    "note": "article", "page": "article", "wiki_page": "article",
+    "person": "entity", "actor": "entity", "organization": "entity",
+    "tag": "topic", "category": "topic", "theme": "topic",
+    "assertion": "claim", "decision": "claim", "thesis": "claim",
+    "reference": "source", "raw": "source", "paper": "source",
+}
+
+EDGE_TYPE_ALIASES = {
+    "references": "cites", "cites_source": "cites",
+    "conflicts_with": "contradicts", "disagrees_with": "contradicts",
+    "refines": "builds_on", "elaborates": "builds_on",
+    "illustrates": "exemplifies", "instance_of": "exemplifies", "example_of": "exemplifies",
+    "belongs_to": "categorized_under", "tagged_with": "categorized_under",
+    "written_by": "authored_by", "created_by": "authored_by",
+    "relates_to": "related", "related_to": "related",
+}
+
+
+# ---------------------------------------------------------------------------
+# Normalization
+# ---------------------------------------------------------------------------
+
+def normalize_node_type(t: str) -> str:
+    t = t.lower().strip()
+    return NODE_TYPE_ALIASES.get(t, t)
+
+
+def normalize_edge_type(t: str) -> str:
+    t = t.lower().strip()
+    return EDGE_TYPE_ALIASES.get(t, t)
+
+
+def normalize_entity_name(name: str) -> str:
+    """Normalize entity names for deduplication."""
+    return re.sub(r'\s+', ' ', name.strip().lower())
+
+
+# ---------------------------------------------------------------------------
+# Merge pipeline
+# ---------------------------------------------------------------------------
+
+def merge(root: Path) -> dict:
+    intermediate = root / ".understand-anything" / "intermediate"
+    manifest_path = intermediate / "scan-manifest.json"
+
+    if not manifest_path.is_file():
+        print(f"Error: {manifest_path} not found. Run parse-knowledge-base.py first.",
+              file=sys.stderr)
+        sys.exit(1)
+
+    # Load scan manifest (deterministic base)
+    manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
+    nodes = {n["id"]: n for n in manifest["nodes"]}
+    edges = list(manifest["edges"])
+
+    report = {"base_nodes": len(nodes), "base_edges": len(edges),
+              "batches": 0, "new_entities": 0, "new_claims": 0,
+              "new_edges": 0, "deduped_entities": 0, "dropped_edges": 0}
+
+    # Load analysis batches
+    batch_files = sorted(intermediate.glob("analysis-batch-*.json"))
+    entity_name_map: dict[str, str] = {}  # normalized_name → entity_id
+    dedup_remap: dict[str, str] = {}  # duplicate_id → canonical_id
+
+    for bf in batch_files:
+        report["batches"] += 1
+        try:
+            batch = json.loads(bf.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError) as e:
+            print(f"[merge] Warning: Failed to load {bf.name}: {e}", file=sys.stderr)
+            continue
+
+        # Process new nodes from LLM analysis
+        for node in batch.get("nodes", []):
+            node_type = normalize_node_type(node.get("type", ""))
+            if node_type not in VALID_NODE_TYPES:
+                print(f"[merge] Warning: Unknown node type '{node.get('type')}' — skipping",
+                      file=sys.stderr)
+                continue
+
+            node["type"] = node_type
+            node_id = node.get("id", "")
+
+            # Entity deduplication — track remapping for edge fixup
+            if node_type == "entity":
+                norm_name = normalize_entity_name(node.get("name", ""))
+                if norm_name in entity_name_map:
+                    # Map duplicate ID → canonical ID for edge remapping
+                    dedup_remap[node_id] = entity_name_map[norm_name]
+                    report["deduped_entities"] += 1
+                    continue
+                entity_name_map[norm_name] = node_id
+                report["new_entities"] += 1
+            elif node_type == "claim":
+                report["new_claims"] += 1
+
+            # Ensure required fields
+            node.setdefault("summary", node.get("name", ""))
+            node.setdefault("tags", [])
+            node.setdefault("complexity", "simple")
+
+            nodes[node_id] = node
+
+        # Process new edges from LLM analysis
+        for edge in batch.get("edges", []):
+            edge_type = normalize_edge_type(edge.get("type", ""))
+            if edge_type not in VALID_EDGE_TYPES:
+                print(f"[merge] Warning: Unknown edge type '{edge.get('type')}' — "
+                      f"mapped to 'related'", file=sys.stderr)
+                edge_type = "related"
+
+            edge["type"] = edge_type
+            edge.setdefault("direction", "forward")
+            edge.setdefault("weight", 0.5)
+
+            # Remap deduped entity IDs, then validate source/target exist
+            src = dedup_remap.get(edge.get("source", ""), edge.get("source", ""))
+            tgt = dedup_remap.get(edge.get("target", ""), edge.get("target", ""))
+            edge["source"] = src
+            edge["target"] = tgt
+            if src in nodes and tgt in nodes:
+                edges.append(edge)
+                report["new_edges"] += 1
+            else:
+                report["dropped_edges"] += 1
+
+    # --- Deduplicate edges ---
+    seen: set[tuple[str, str, str]] = set()
+    final_edges = []
+    for edge in edges:
+        key = (edge["source"], edge["target"], edge["type"])
+        if key not in seen:
+            seen.add(key)
+            final_edges.append(edge)
+
+    # --- Build article→layer map from categories ---
+    categories = manifest.get("categories", [])
+    article_layer_map: dict[str, str] = {}  # article_id → layer_id
+    layer_members: dict[str, list[str]] = {}  # layer_id → [node_ids]
+
+    for cat in categories:
+        cat_name = cat["name"]
+        cat_slug = cat_name.lower().replace(" ", "-")
+        layer_id = f"layer:{cat_slug}"
+        topic_id = f"topic:{cat_slug}"
+        members = [e["source"] for e in final_edges
+                   if e["type"] == "categorized_under" and e["target"] == topic_id]
+        if topic_id in nodes:
+            members.append(topic_id)
+        layer_members[layer_id] = members
+        for mid in members:
+            article_layer_map[mid] = layer_id
+
+    # --- Assign entity/claim nodes to their parent article's layer ---
+    # Step 1: Build entity/claim → article mapping from edges
+    child_to_article: dict[str, str] = {}
+    for edge in final_edges:
+        src_type = nodes.get(edge["source"], {}).get("type", "")
+        tgt_type = nodes.get(edge["target"], {}).get("type", "")
+        # If an article connects to an entity/claim, map the child to the article
+        if src_type == "article" and tgt_type in ("entity", "claim"):
+            child_to_article.setdefault(edge["target"], edge["source"])
+        elif tgt_type == "article" and src_type in ("entity", "claim"):
+            child_to_article.setdefault(edge["source"], edge["target"])
+
+    # Step 2: For orphan entities/claims, try to match by ID prefix
+    # Build a reverse lookup: bare article name → full article ID
+    # e.g., "concept-aaak-compression" → "article:concepts/concept-aaak-compression"
+    bare_to_article: dict[str, str] = {}
+    for nid in nodes:
+        if nid.startswith("article:"):
+            # Extract the bare filename from paths like "article:concepts/concept-foo"
+            bare = nid.split("/")[-1] if "/" in nid else nid.replace("article:", "")
+            bare_to_article[bare] = nid
+
+    for nid, node in nodes.items():
+        if node["type"] in ("entity", "claim") and nid not in child_to_article:
+            # e.g., "claim:concept-aaak-compression:not-zero-loss" → stem "concept-aaak-compression"
+            # e.g., "entity:brain" → stem "brain"
+            raw = nid.split(":", 1)[1] if ":" in nid else nid  # "concept-aaak-compression:not-zero-loss"
+            stem = raw.split(":")[0]  # "concept-aaak-compression"
+
+            # Try exact bare name match first
+            if stem in bare_to_article:
+                child_to_article[nid] = bare_to_article[stem]
+            else:
+                # Try suffix/substring match against bare names
+                # e.g., entity:brain → segment-brain, entity:mempalace → tool-mempalace
+                matched = False
+                for bare, aid in bare_to_article.items():
+                    if stem in bare or bare in stem:
+                        child_to_article[nid] = aid
+                        matched = True
+                        break
+                    # Also try: bare ends with -stem (e.g., "segment-brain" ends with "-brain")
+                    if bare.endswith(f"-{stem}") or bare.endswith(f"/{stem}"):
+                        child_to_article[nid] = aid
+                        matched = True
+                        break
+                # Last resort: check if the node's name appears in any article's
+                # name OR content (knowledgeMeta.content)
+                if not matched and node.get("name"):
+                    node_name_lower = node["name"].lower()
+                    for aid, anode in nodes.items():
+                        if not aid.startswith("article:"):
+                            continue
+                        # Match against article name
+                        if node_name_lower in anode.get("name", "").lower():
+                            child_to_article[nid] = aid
+                            matched = True
+                            break
+                        # Match against article content (wikilinks or text)
+                        meta = anode.get("knowledgeMeta", {})
+                        content = (meta.get("content") or "").lower()
+                        if len(node_name_lower) >= 3 and node_name_lower in content:
+                            child_to_article[nid] = aid
+                            matched = True
+                            break
+
+    # Step 3: Place children into their parent article's layer
+    for child_id, article_id in child_to_article.items():
+        layer_id = article_layer_map.get(article_id)
+        if layer_id and layer_id in layer_members:
+            layer_members[layer_id].append(child_id)
+            article_layer_map[child_id] = layer_id
+
+    # --- Build layers ---
+    layers = []
+    for cat in categories:
+        cat_name = cat["name"]
+        cat_slug = cat_name.lower().replace(" ", "-")
+        layer_id = f"layer:{cat_slug}"
+        members = list(dict.fromkeys(layer_members.get(layer_id, [])))  # Deduplicate preserving order
+        layers.append({
+            "id": layer_id,
+            "name": cat_name,
+            "description": f"{cat_name} ({len(members)} nodes)",
+            "nodeIds": members,
+        })
+
+    # Assign uncategorized nodes to an "Other" layer
+    categorized_ids = set()
+    for layer in layers:
+        categorized_ids.update(layer["nodeIds"])
+    uncategorized = [nid for nid in nodes if nid not in categorized_ids]
+    if uncategorized:
+        layers.append({
+            "id": "layer:other",
+            "name": "Other",
+            "description": f"Uncategorized nodes ({len(uncategorized)})",
+            "nodeIds": uncategorized,
+        })
+
+    # --- Build tour from index.md category ordering ---
+    tour = []
+    for i, cat in enumerate(categories):
+        cat_slug = cat["name"].lower().replace(" ", "-")
+        topic_id = f"topic:{cat_slug}"
+        # Pick representative articles (up to 3 per category)
+        members = [e["source"] for e in final_edges
+                   if e["type"] == "categorized_under" and e["target"] == topic_id][:3]
+        if not members and topic_id in nodes:
+            members = [topic_id]
+        if members:
+            tour.append({
+                "order": i + 1,
+                "title": cat["name"],
+                "description": f"Explore the {cat['name']} section ({cat['count']} articles)",
+                "nodeIds": members,
+            })
+
+    # --- Detect project name ---
+    project_name = root.name
+    # Try to find a better name from index.md H1
+    index_path = root / "wiki" / "index.md"
+    if not index_path.is_file():
+        index_path = root / "index.md"
+    if index_path.is_file():
+        text = index_path.read_text(encoding="utf-8", errors="replace")
+        h1_match = re.search(r"^#\s+(.+)$", text, re.MULTILINE)
+        if h1_match:
+            project_name = h1_match.group(1).strip()
+
+    # --- Assemble final graph ---
+    graph = {
+        "version": "1.0.0",
+        "kind": "knowledge",
+        "project": {
+            "name": project_name,
+            "languages": ["markdown"],
+            "frameworks": ["karpathy-wiki"],
+            "description": f"Knowledge graph for {project_name}",
+            "analyzedAt": datetime.now(timezone.utc).isoformat(),
+            "gitCommitHash": "",
+        },
+        "nodes": list(nodes.values()),
+        "edges": final_edges,
+        "layers": layers,
+        "tour": tour,
+    }
+
+    # Try to get git commit hash
+    try:
+        import subprocess
+        result = subprocess.run(
+            ["git", "rev-parse", "HEAD"],
+            capture_output=True, text=True, cwd=str(root), timeout=5
+        )
+        if result.returncode == 0:
+            graph["project"]["gitCommitHash"] = result.stdout.strip()
+    except (OSError, subprocess.TimeoutExpired):
+        pass
+
+    # Write output
+    out_path = intermediate / "assembled-graph.json"
+    out_path.write_text(json.dumps(graph, indent=2), encoding="utf-8")
+
+    # Report
+    print(f"[merge] Input: {report['base_nodes']} scan nodes, "
+          f"{report['base_edges']} scan edges, {report['batches']} analysis batches",
+          file=sys.stderr)
+    print(f"[merge] Added: {report['new_entities']} entities, "
+          f"{report['new_claims']} claims, {report['new_edges']} edges "
+          f"({report['deduped_entities']} deduped entities, "
+          f"{report['dropped_edges']} dropped dangling edges)", file=sys.stderr)
+    print(f"[merge] Output: {len(graph['nodes'])} nodes, {len(final_edges)} edges, "
+          f"{len(layers)} layers, {len(tour)} tour steps", file=sys.stderr)
+    print(f"[merge] Written: {out_path}", file=sys.stderr)
+
+    return graph
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: merge-knowledge-graph.py <wiki-directory>", file=sys.stderr)
+        sys.exit(1)
+
+    root = Path(sys.argv[1]).resolve()
+    if not root.is_dir():
+        print(f"Error: {root} is not a directory", file=sys.stderr)
+        sys.exit(1)
+
+    merge(root)
+
+
+if __name__ == "__main__":
+    main()
--- a/.qoder/skills/understand-knowledge/parse-knowledge-base.py
+++ b/.qoder/skills/understand-knowledge/parse-knowledge-base.py
@@ -0,0 +1,509 @@
+#!/usr/bin/env python3
+"""
+Deterministic parser for Karpathy-pattern LLM wikis.
+
+Detects the three-layer pattern (raw sources + wiki markdown + schema),
+extracts structure from markdown files, resolves wikilinks, and derives
+categories from index.md section headings.
+
+Usage:
+    python parse-knowledge-base.py <wiki-directory>
+
+Output:
+    Writes scan-manifest.json to <wiki-directory>/.understand-anything/intermediate/
+"""
+
+import json
+import os
+import re
+import sys
+from pathlib import Path
+
+# ---------------------------------------------------------------------------
+# Regex patterns
+# ---------------------------------------------------------------------------
+WIKILINK_RE = re.compile(r"\[\[([^\]|]+)(?:\|([^\]]+))?\]\]")
+FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n", re.DOTALL)
+CODE_BLOCK_RE = re.compile(r"```(\w*)")
+HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$", re.MULTILINE)
+INDEX_SECTION_RE = re.compile(r"^##\s+(.+)$", re.MULTILINE)
+
+# Files that are part of wiki infrastructure, not content articles
+INFRA_FILES = {"index.md", "log.md", "claude.md", "agents.md", "soul.md"}
+
+# ---------------------------------------------------------------------------
+# Detection: is this a Karpathy-pattern wiki?
+# ---------------------------------------------------------------------------
+
+def detect_format(root: Path) -> dict:
+    """Detect if directory follows the Karpathy LLM wiki three-layer pattern."""
+    signals = {
+        "has_index": (root / "index.md").is_file() or (root / "wiki" / "index.md").is_file(),
+        "has_log": (root / "log.md").is_file() or (root / "wiki" / "log.md").is_file(),
+        "has_raw": (root / "raw").is_dir(),
+        "has_schema": any(
+            (root / f).is_file() or (root / "wiki" / f).is_file()
+            for f in ["CLAUDE.md", "AGENTS.md"]
+        ),
+    }
+
+    # Find the wiki root — could be the directory itself or a wiki/ subdirectory
+    if (root / "wiki").is_dir():
+        wiki_root = root / "wiki"
+    else:
+        wiki_root = root
+
+    # Count markdown files in the wiki root
+    md_files = list(wiki_root.rglob("*.md"))
+    signals["md_count"] = len(md_files)
+    signals["wiki_root"] = str(wiki_root)
+
+    # Primary signal: has index.md + meaningful number of markdown files
+    if signals["has_index"] and signals["md_count"] >= 3:
+        signals["detected"] = True
+        signals["format"] = "karpathy"
+    else:
+        signals["detected"] = False
+        signals["format"] = "unknown"
+
+    return signals
+
+
+# ---------------------------------------------------------------------------
+# Markdown extraction helpers
+# ---------------------------------------------------------------------------
+
+def extract_frontmatter(text: str) -> dict:
+    """Extract YAML frontmatter as a simple key-value dict."""
+    m = FRONTMATTER_RE.match(text)
+    if not m:
+        return {}
+    fm = {}
+    for line in m.group(1).split("\n"):
+        if ":" in line:
+            key, _, val = line.partition(":")
+            fm[key.strip()] = val.strip().strip('"').strip("'")
+    return fm
+
+
+def extract_wikilinks(text: str) -> list[dict]:
+    """Extract all [[target]] and [[target|display]] wikilinks."""
+    links = []
+    for m in WIKILINK_RE.finditer(text):
+        links.append({
+            "target": m.group(1).strip(),
+            "display": m.group(2).strip() if m.group(2) else None,
+        })
+    return links
+
+
+def extract_headings(text: str) -> list[dict]:
+    """Extract all markdown headings with level and text."""
+    return [
+        {"level": len(m.group(1)), "text": m.group(2).strip()}
+        for m in HEADING_RE.finditer(text)
+    ]
+
+
+def extract_code_blocks(text: str) -> list[str]:
+    """Extract languages from fenced code blocks."""
+    return [m.group(1) for m in CODE_BLOCK_RE.finditer(text) if m.group(1)]
+
+
+def extract_first_paragraph(text: str) -> str:
+    """Extract the first non-empty paragraph after frontmatter and H1."""
+    # Strip frontmatter
+    stripped = FRONTMATTER_RE.sub("", text).strip()
+    if not stripped:
+        return ""
+    lines = stripped.split("\n")
+
+    def _collect_paragraph(start_lines: list[str]) -> str:
+        """Collect the first paragraph from the given lines."""
+        para: list[str] = []
+        for s_raw in start_lines:
+            s = s_raw.strip()
+            if not s and not para:
+                continue  # Skip leading blank lines
+            if not s and para:
+                break  # End of paragraph
+            if s.startswith(">"):
+                continue  # Skip blockquotes
+            if re.match(r"^[-*_]{3,}\s*$", s):
+                continue  # Skip horizontal rules
+            if s.startswith("#"):
+                if para:
+                    break  # End paragraph at next heading
+                continue  # Skip headings before paragraph
+            para.append(s)
+        return " ".join(para)
+
+    # Try: find first paragraph after H1
+    for i, line in enumerate(lines):
+        if line.strip().startswith("# "):
+            result = _collect_paragraph(lines[i + 1:])
+            if result:
+                if len(result) > 200:
+                    return result[:197] + "..."
+                return result
+
+    # Fallback: no H1 found, take first paragraph from start
+    result = _collect_paragraph(lines)
+    if len(result) > 200:
+        result = result[:197] + "..."
+    return result or ""
+
+
+def extract_h1(text: str) -> str:
+    """Extract the first H1 heading."""
+    for m in HEADING_RE.finditer(text):
+        if len(m.group(1)) == 1:
+            # Strip trailing wiki-style decorations like " — subtitle"
+            return m.group(2).strip()
+    return ""
+
+
+# ---------------------------------------------------------------------------
+# Index.md parsing — categories come from section headings
+# ---------------------------------------------------------------------------
+
+def parse_index(index_path: Path) -> list[dict]:
+    """Parse index.md to extract categories from ## headings and their wikilinks."""
+    if not index_path.is_file():
+        return []
+    text = index_path.read_text(encoding="utf-8", errors="replace")
+    categories = []
+    current_category = None
+
+    for line in text.split("\n"):
+        # Detect ## section heading
+        sec_match = re.match(r"^##\s+(.+)$", line)
+        if sec_match:
+            current_category = {
+                "name": sec_match.group(1).strip(),
+                "articles": [],
+            }
+            categories.append(current_category)
+            continue
+
+        # Collect wikilinks under current section
+        if current_category:
+            for wl in WIKILINK_RE.finditer(line):
+                current_category["articles"].append(wl.group(1).strip())
+
+    return categories
+
+
+# ---------------------------------------------------------------------------
+# Log.md parsing — extract operation timeline
+# ---------------------------------------------------------------------------
+
+def parse_log(log_path: Path) -> list[dict]:
+    """Parse log.md to extract chronological entries."""
+    if not log_path.is_file():
+        return []
+    text = log_path.read_text(encoding="utf-8", errors="replace")
+    entries = []
+    log_entry_re = re.compile(
+        r"^##\s+\[(\d{4}-\d{2}-\d{2})\]\s+(\w+)\s*\|\s*(.+)$", re.MULTILINE
+    )
+    for m in log_entry_re.finditer(text):
+        entries.append({
+            "date": m.group(1),
+            "operation": m.group(2),
+            "title": m.group(3).strip(),
+        })
+    return entries
+
+
+# ---------------------------------------------------------------------------
+# Main pipeline
+# ---------------------------------------------------------------------------
+
+def build_name_to_stem_map(wiki_root: Path) -> dict[str, str]:
+    """Build a case-insensitive map from filename stem to relative stem path.
+
+    Full relative paths always map uniquely. Bare basenames map only when
+    unambiguous — duplicate basenames are removed so they don't silently
+    resolve to the wrong page.
+    """
+    name_map: dict[str, str] = {}
+    # Track which bare basenames appear more than once
+    basename_counts: dict[str, int] = {}
+    for md_file in wiki_root.rglob("*.md"):
+        rel = md_file.relative_to(wiki_root)
+        stem = rel.with_suffix("").as_posix()  # e.g., "decisions/decision-foo"
+        basename = md_file.stem            # e.g., "decision-foo"
+        # Full relative path always maps uniquely
+        name_map[stem.lower()] = stem
+        # Track basename for ambiguity detection
+        key = basename.lower()
+        basename_counts[key] = basename_counts.get(key, 0) + 1
+        name_map[key] = stem
+
+    # Remove ambiguous basename entries (appear more than once)
+    for key, count in basename_counts.items():
+        if count > 1 and key in name_map:
+            del name_map[key]
+
+    return name_map
+
+
+def resolve_wikilink(target: str, name_map: dict[str, str], node_ids: set[str] | None = None) -> str | None:
+    """Resolve a wikilink target to an article node ID.
+
+    If node_ids is provided, only resolve to IDs that exist in the set.
+    """
+    key = target.lower().strip()
+    # Skip targets that are clearly not page names (shell flags, etc.)
+    if key.startswith("-"):
+        return None
+    stem = name_map.get(key)
+    if stem:
+        candidate = f"article:{stem}"
+        # If we have a node set, verify the target exists
+        if node_ids is not None and candidate not in node_ids:
+            return None
+        return candidate
+    # Try without directory prefix
+    for stored_key, stored_stem in name_map.items():
+        if stored_key.endswith("/" + key) or stored_key == key:
+            candidate = f"article:{stored_stem}"
+            if node_ids is not None and candidate not in node_ids:
+                return None
+            return candidate
+    return None
+
+
+def parse_wiki(root: Path) -> dict:
+    """Parse a Karpathy-pattern wiki and produce the scan manifest."""
+    detection = detect_format(root)
+    if not detection["detected"]:
+        print(json.dumps({"error": "Not a Karpathy-pattern wiki", "detection": detection}),
+              file=sys.stderr)
+        sys.exit(1)
+
+    wiki_root = Path(detection["wiki_root"])
+    raw_root = root / "raw"
+
+    # Build name resolution map
+    name_map = build_name_to_stem_map(wiki_root)
+
+    # Find index.md and log.md
+    index_path = wiki_root / "index.md"
+    if not index_path.is_file():
+        index_path = root / "index.md"
+    log_path = wiki_root / "log.md"
+    if not log_path.is_file():
+        log_path = root / "log.md"
+
+    # Parse index for categories
+    categories = parse_index(index_path)
+    log_entries = parse_log(log_path)
+
+    # Build category lookup: wikilink target → category name
+    category_lookup: dict[str, str] = {}
+    for cat in categories:
+        for article_target in cat["articles"]:
+            category_lookup[article_target.lower()] = cat["name"]
+
+    # --- Pre-compute article IDs (for edge resolution validation) ---
+    # Only skip infra files at the wiki root level, not in subdirectories
+    # (e.g., wiki/index.md is infra, but wiki/concepts/index.md is content)
+    article_ids: set[str] = set()
+    for md_file in sorted(wiki_root.rglob("*.md")):
+        rel = md_file.relative_to(wiki_root)
+        stem = rel.with_suffix("").as_posix()
+        # Only filter infra files at root level (no parent directory)
+        if rel.parent == Path(".") and rel.name.lower() in INFRA_FILES:
+            continue
+        article_ids.add(f"article:{stem}")
+
+    # --- Build article nodes ---
+    nodes = []
+    edges = []
+    warnings = []
+    stats = {"articles": 0, "sources": 0, "topics": 0, "wikilinks": 0, "unresolved": 0}
+
+    for md_file in sorted(wiki_root.rglob("*.md")):
+        rel = md_file.relative_to(wiki_root)
+        stem = rel.with_suffix("").as_posix()
+        basename = md_file.stem
+
+        # Skip infrastructure files only at wiki root level
+        if rel.parent == Path(".") and rel.name.lower() in INFRA_FILES:
+            continue
+
+        text = md_file.read_text(encoding="utf-8", errors="replace")
+        h1 = extract_h1(text)
+        frontmatter = extract_frontmatter(text)
+        wikilinks = extract_wikilinks(text)
+        headings = extract_headings(text)
+        code_langs = extract_code_blocks(text)
+        summary = extract_first_paragraph(text)
+        line_count = text.count("\n") + 1
+        word_count = len(text.split())
+
+        # Derive category from index.md lookup
+        category = category_lookup.get(basename.lower(), "")
+        if not category:
+            # Try stem match
+            category = category_lookup.get(stem.lower(), "")
+
+        # Derive tags (deduplicated)
+        tag_set: set[str] = set()
+        if category:
+            tag_set.add(category.lower())
+        if rel.parent != Path("."):
+            tag_set.add(str(rel.parent))
+        fm_tags = frontmatter.get("tags", "")
+        if fm_tags:
+            tag_set.update(t.strip() for t in fm_tags.split(",") if t.strip())
+        tags = sorted(tag_set)
+
+        # Complexity from wikilink density
+        wl_count = len(wikilinks)
+        if wl_count > 15:
+            complexity = "complex"
+        elif wl_count > 5:
+            complexity = "moderate"
+        else:
+            complexity = "simple"
+
+        node_id = f"article:{stem}"
+        nodes.append({
+            "id": node_id,
+            "type": "article",
+            "name": h1 or basename,
+            "filePath": str(rel),
+            "summary": summary or f"Wiki article: {h1 or basename}",
+            "tags": tags,
+            "complexity": complexity,
+            "knowledgeMeta": {
+                "wikilinks": [wl["target"] for wl in wikilinks],
+                **({"category": category} if category else {}),
+                "content": text[:3000],  # First 3000 chars for LLM analysis
+            },
+        })
+        stats["articles"] += 1
+        stats["wikilinks"] += wl_count
+
+        # Build edges from wikilinks (resolve against known article IDs)
+        for wl in wikilinks:
+            target_id = resolve_wikilink(wl["target"], name_map, article_ids)
+            if target_id and target_id != node_id:
+                edges.append({
+                    "source": node_id,
+                    "target": target_id,
+                    "type": "related",
+                    "direction": "forward",
+                    "weight": 0.7,
+                })
+            elif not target_id:
+                warnings.append(f"Unresolved wikilink: [[{wl['target']}]] in {rel}")
+                stats["unresolved"] += 1
+
+    # --- Build topic nodes from index.md categories ---
+    for cat in categories:
+        topic_id = f"topic:{cat['name'].lower().replace(' ', '-')}"
+        nodes.append({
+            "id": topic_id,
+            "type": "topic",
+            "name": cat["name"],
+            "summary": f"Category from index: {cat['name']} ({len(cat['articles'])} articles)",
+            "tags": ["category"],
+            "complexity": "simple",
+        })
+        stats["topics"] += 1
+
+        # categorized_under edges (only resolve to known article nodes)
+        for article_target in cat["articles"]:
+            article_id = resolve_wikilink(article_target, name_map, article_ids)
+            if article_id:
+                edges.append({
+                    "source": article_id,
+                    "target": topic_id,
+                    "type": "categorized_under",
+                    "direction": "forward",
+                    "weight": 0.6,
+                })
+
+    # --- Build source nodes from raw/ ---
+    if raw_root.is_dir():
+        for raw_file in sorted(raw_root.rglob("*")):
+            if raw_file.is_file() and not raw_file.name.startswith("."):
+                rel_raw = raw_file.relative_to(root)
+                ext = raw_file.suffix.lower()
+                size_kb = raw_file.stat().st_size / 1024
+                source_id = f"source:{raw_file.relative_to(raw_root).with_suffix('')}"
+                nodes.append({
+                    "id": source_id,
+                    "type": "source",
+                    "name": raw_file.name,
+                    "filePath": str(rel_raw),
+                    "summary": f"Raw source ({ext or 'unknown'}, {size_kb:.0f} KB)",
+                    "tags": ["raw", ext.lstrip(".") or "unknown"],
+                    "complexity": "simple",
+                })
+                stats["sources"] += 1
+
+    # --- Compute backlinks ---
+    backlink_map: dict[str, list[str]] = {}
+    for edge in edges:
+        if edge["type"] == "related":
+            target = edge["target"]
+            source = edge["source"]
+            backlink_map.setdefault(target, []).append(source)
+    for node in nodes:
+        if node["type"] == "article" and "knowledgeMeta" in node:
+            bl = backlink_map.get(node["id"], [])
+            node["knowledgeMeta"]["backlinks"] = bl
+
+    # --- Deduplicate edges ---
+    seen_edges: set[tuple[str, str, str]] = set()
+    deduped_edges = []
+    for edge in edges:
+        key = (edge["source"], edge["target"], edge["type"])
+        if key not in seen_edges:
+            seen_edges.add(key)
+            deduped_edges.append(edge)
+
+    return {
+        "format": "karpathy",
+        "stats": stats,
+        "categories": [{"name": c["name"], "count": len(c["articles"])} for c in categories],
+        "logEntries": len(log_entries),
+        "nodes": nodes,
+        "edges": deduped_edges,
+        "warnings": warnings[:50],  # Cap warnings
+    }
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: parse-knowledge-base.py <wiki-directory>", file=sys.stderr)
+        sys.exit(1)
+
+    root = Path(sys.argv[1]).resolve()
+    if not root.is_dir():
+        print(f"Error: {root} is not a directory", file=sys.stderr)
+        sys.exit(1)
+
+    manifest = parse_wiki(root)
+
+    # Write output
+    out_dir = root / ".understand-anything" / "intermediate"
+    out_dir.mkdir(parents=True, exist_ok=True)
+    out_path = out_dir / "scan-manifest.json"
+    out_path.write_text(json.dumps(manifest, indent=2), encoding="utf-8")
+
+    # Report to stderr
+    s = manifest["stats"]
+    print(f"[parse] Karpathy wiki: {s['articles']} articles, {s['sources']} sources, "
+          f"{s['topics']} topics, {s['wikilinks']} wikilinks "
+          f"({s['unresolved']} unresolved)", file=sys.stderr)
+    print(f"[parse] Output: {out_path}", file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()