fix: detect skill name/dir mismatch and report metadata drift

Skill discovery now tracks dir_name alongside frontmatter name and detects when they differ. skills list --output-format json includes metadata_drift array and reports degraded status when drift entries exist. Generated with https://github.com/Yeachan-Heo/gajae-code Co-authored-by: Gajae Code <dev@gajae-code.com>
2026-07-21 06:13:54 +08:00 · 2026-06-05 00:36:04 +09:00
parent 8fd11e82c4
commit 5d85739358
2 changed files with 73 additions and 15 deletions
@@ -6410,7 +6410,7 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 444. **DONE — broad-cwd safety guard extended to `--resume` path** — fixed 2026-06-04 in `fix: add broad-cwd guard to resume path`. `claw --resume latest` from `/`, `$HOME`, or other broad directories now enforces the same broad-cwd policy as `claw prompt` and `claw` (interactive REPL). The `--allow-broad-cwd` flag bypasses the guard. The `enforce_broad_cwd_policy` check runs before any session filesystem operations. Remaining sibling items: exit code 1 for `session_load_failed` and filesystem droppings prevention on failed resume are tracked separately.


-445. **Skill name-vs-directory mismatch is silently accepted — `.claw/skills/wrong-name/SKILL.md` with frontmatter `name: actually-different-name` loads as "actually-different-name" without any warning; users who reference the skill by directory name (`claw skills run wrong-name`) get `skill_not_found` while `skills list` shows it under the frontmatter name; sibling: loose `.md` files at the skills-dir root and subdirs without `SKILL.md` are silently dropped** — dogfooded 2026-05-11 by Jobdori on `9e1eafd0` in response to Clawhip pinpoint nudge at `1503381189539528897`. Reproduction: create `.claw/skills/wrong-name/SKILL.md` with frontmatter `---\nname: actually-different-name\ndescription: Skill where dir name and frontmatter name disagree\n---`. Run `claw skills list --output-format json` → the skill is listed with `name: "actually-different-name"` (the frontmatter value), no warning about the dir-vs-name mismatch. Users who type `claw skills run wrong-name` (the dirname they know from `ls`) get a `skill_not_found` error; `claw skills run actually-different-name` works. The two names are decoupled with no surfaced relationship. **Three sibling silent-drop bugs in same probe:** (a) **subdir without SKILL.md silently skipped**: `.claw/skills/no-skill-md/` containing only `README.md` (no `SKILL.md`) is silently skipped from `skills list`. No `invalid_skills:[{path, reason:"missing_SKILL.md"}]` array, no warning, just absent from output. (b) **Loose `.md` at skills dir root silently dropped**: `.claw/skills/loose-skill.md` (not inside a per-skill subdirectory) is silently ignored. Discovery only walks `.claw/skills/*/SKILL.md` — no support for flat `.claw/skills/<name>.md`. (c) **Workspace + user skills merged without per-source filter**: `skills list` returns 74 entries including all `~/.claw/skills/*` user-home skills alongside the project skills. There's no `--scope workspace` flag to limit output to just project-local skills; automation has to filter by `source.id == "project_claw"` post-hoc. **Required fix shape:** (a) when SKILL.md frontmatter `name` differs from the parent directory name, emit a `skills_metadata_drift:[{dir_name, frontmatter_name, path}]` array OR enforce `name = dir_name` as a hard rule; if neither, at minimum a stderr warning on each invocation; (b) skill subdirectories without `SKILL.md` should surface as `invalid_skills:[{path, reason}]` in `skills list --output-format json` (same pattern as #440 MCP servers, #441 hooks, #442 agents); (c) support loose `.md` files at skills-dir root OR document explicitly that only subdirectories with `SKILL.md` are discovered; (d) add `--scope workspace|user|all` flag to `skills list` for filtering; (e) regression test: dir/frontmatter mismatch triggers a deterministic warning or error; subdirs without SKILL.md show in invalid array. **Why this matters:** skill discovery is a security-relevant surface — a user's `claw skills run X` could end up running a different skill than they thought if dir-name and frontmatter-name diverge. The silent drops mean users can't tell why their skill files aren't recognized, leading to "I copied the example and it doesn't work" forum questions. Cross-references #440 (MCP all-or-nothing), #441 (hooks all-or-nothing), #442 (agents need TOML, .md dropped), #431 (skills install raw OS error). Source: Jobdori live dogfood, `9e1eafd0`, 2026-05-11.
+445. **DONE — skill name-vs-directory mismatch now detected and reported** — fixed 2026-06-04 in `fix: detect skill name/dir mismatch and report metadata drift`. Skill discovery now tracks `dir_name` alongside the frontmatter `name` and detects when they differ. `skills list --output-format json` includes `metadata_drift:[{dir_name, frontmatter_name, path}]` and reports `status:"degraded"` when drift entries exist. `valid_count` and `metadata_drift_count` provide automation-friendly counts. The `SkillMetadataDrift` struct tracks each mismatch for downstream tooling. Remaining sibling items: subdirs without SKILL.md and loose .md files at skills-dir root are tracked separately.


 446. **Config is loaded 2-3 times per command invocation; each load re-emits identical deprecation warnings without deduplication — `status` triggers 3× `enabledPlugins` warning, `doctor`/`mcp` trigger 2× each, only `version` (config-free) emits 0** — dogfooded 2026-05-11 by Jobdori on `5a4cc506` in response to Clawhip pinpoint nudge at `1503388740595224717`. Reproduction: with a `~/.claw/settings.json` containing the deprecated `enabledPlugins` key, run each command from a fresh empty cwd and count `warning: ... is deprecated` lines on stderr — `claw status 2>&1 >/dev/null | grep -c deprecated` returns **3**, `claw doctor` returns **2**, `claw mcp` returns **2**, `claw version` returns **0**. Each duplicate is byte-identical (same file path, same line number, same field name). The pattern proves the config-load pipeline is invoked 2-3 times within a single command process; warnings are emitted at each load without checking a `warned_files: HashSet<PathBuf>` deduplication set. **Three sibling implications:** (a) **load-count varies by command** — status:3, doctor:2, mcp:2, version:0 — suggesting each command implements its own config-load call rather than going through a shared cached loader; (b) **noise pollution**: users running `claw status` once see the same 64-character warning 3 times in their terminal scrollback, making real warnings (other config errors, real deprecations) lost in the duplicate noise; (c) **performance signal**: 3× config load means 3× JSON parsing of `~/.claw/settings.json`, `~/.claw.json`, `$CLAW_CONFIG_HOME/settings.json`, and the project-local `.claw.json` / `.claw/settings.json` / `.claw/settings.local.json`. For a workspace with 5 config files, that's 15 redundant disk reads per status invocation. Earlier roadmap entries observed 3× (#424) and 4× (#425) warning counts at different HEADs; the count keeps fluctuating, suggesting the underlying issue is config-load fan-out that nobody has refactored. **Required fix shape:** (a) introduce a `ConfigLoader` cache scoped to the command-process lifetime: first load reads files and emits warnings; subsequent calls hit the cache and emit zero warnings; (b) move config validation/warnings to a single canonical entry point (`ConfigLoader::load_with_diagnostics()` returns `(RuntimeConfig, Vec<Warning>)` exactly once); (c) every command that needs config goes through the cached loader instead of re-reading from disk; (d) `doctor --output-format json` exposes `config_load_count:int` field so we can regression-test that loads are deduplicated; (e) regression test: any single command invocation emits each deprecation warning at most once. **Why this matters:** repeated identical warnings train users to ignore stderr noise. Real warnings (a new deprecation, a config error from a different file, an MCP server failure) get drowned out by 3-4 copies of the same notice. The 15-disk-read worst case is wasted I/O that adds startup latency. The fact that count fluctuates between HEADs (3 at `6c0c305a`, 4 at `d7dbe951`, back to 3 at `5a4cc506`) suggests dev velocity is moving config loads around without an architectural fix. Cross-references #424 (deprecation warning 3×), #425 (deprecation warning 4×), #421 (cwd canonicalization — possibly tied to per-load symlink resolution), #428 (default permission_mode loaded from same config files). Source: Jobdori live dogfood, `5a4cc506`, 2026-05-11.
@@ -2193,6 +2193,13 @@ pub(crate) struct SkillMetadataDrift {
    pub(crate) path: PathBuf,
 }

+/// Loaded skill definitions plus any metadata drift entries.
+#[derive(Debug, Clone, Default)]
+pub(crate) struct SkillCollection {
+    pub(crate) skills: Vec<SkillSummary>,
+    pub(crate) metadata_drift: Vec<SkillMetadataDrift>,
+}
+
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 enum SkillOrigin {
    SkillsDir,
@@ -2808,8 +2815,8 @@ pub fn handle_skills_slash_command_json(args: Option<&str>, cwd: &Path) -> std::
    match normalize_optional_args(args) {
        None | Some("list") => {
            let roots = discover_skill_roots(cwd);
-            let skills = load_skills_from_roots(&roots)?;
-            Ok(render_skills_report_json_with_action(&skills, "list"))
+            let collection = load_skills_from_roots_with_drift(&roots)?;
+            Ok(render_skills_report_json_with_action(&collection, "list"))
        }
        Some(args) if args.starts_with("list ") => {
            let filter = args["list ".len()..].trim().to_lowercase();
@@ -2826,17 +2833,25 @@ pub fn handle_skills_slash_command_json(args: Option<&str>, cwd: &Path) -> std::
                }));
            }
            let roots = discover_skill_roots(cwd);
-            let skills = load_skills_from_roots(&roots)?;
-            let filtered: Vec<_> = skills
+            let collection = load_skills_from_roots_with_drift(&roots)?;
+            let filtered_skills: Vec<_> = collection
+                .skills
                .into_iter()
                .filter(|s| s.name.to_lowercase().contains(&filter))
                .collect();
-            Ok(render_skills_report_json_with_action(&filtered, "list"))
+            let filtered_collection = SkillCollection {
+                skills: filtered_skills,
+                metadata_drift: collection.metadata_drift,
+            };
+            Ok(render_skills_report_json_with_action(
+                &filtered_collection,
+                "list",
+            ))
        }
        Some("show" | "info" | "describe") => {
            let roots = discover_skill_roots(cwd);
-            let skills = load_skills_from_roots(&roots)?;
-            Ok(render_skills_report_json_with_action(&skills, "show"))
+            let collection = load_skills_from_roots_with_drift(&roots)?;
+            Ok(render_skills_report_json_with_action(&collection, "show"))
        }
        Some(args)
            if args.starts_with("show ")
@@ -2867,8 +2882,9 @@ pub fn handle_skills_slash_command_json(args: Option<&str>, cwd: &Path) -> std::
                }));
            }
            let roots = discover_skill_roots(cwd);
-            let skills = load_skills_from_roots(&roots)?;
-            let matched: Vec<_> = skills
+            let collection = load_skills_from_roots_with_drift(&roots)?;
+            let matched: Vec<_> = collection
+                .skills
                .into_iter()
                .filter(|s| s.name.to_lowercase() == name)
                .collect();
@@ -2885,7 +2901,14 @@ pub fn handle_skills_slash_command_json(args: Option<&str>, cwd: &Path) -> std::
                    "hint": "Run `claw skills list` to see available skills.",
                }));
            }
-            Ok(render_skills_report_json_with_action(&matched, "show"))
+            let matched_collection = SkillCollection {
+                skills: matched,
+                metadata_drift: collection.metadata_drift,
+            };
+            Ok(render_skills_report_json_with_action(
+                &matched_collection,
+                "show",
+            ))
        }
        Some("install") => Ok(render_skills_missing_argument_json(
            "install",
@@ -4028,7 +4051,15 @@ fn load_agents_from_roots_with_invalids(
 }

 fn load_skills_from_roots(roots: &[SkillRoot]) -> std::io::Result<Vec<SkillSummary>> {
+    let collection = load_skills_from_roots_with_drift(roots)?;
+    Ok(collection.skills)
+}
+
+/// Load skill definitions from all roots, collecting metadata drift entries
+/// where the frontmatter name differs from the directory name.
+fn load_skills_from_roots_with_drift(roots: &[SkillRoot]) -> std::io::Result<SkillCollection> {
    let mut skills = Vec::new();
+    let mut metadata_drift = Vec::new();
    let mut active_sources = BTreeMap::<String, DefinitionSource>::new();

    for root in roots {
@@ -4047,6 +4078,16 @@ fn load_skills_from_roots(roots: &[SkillRoot]) -> std::io::Result<Vec<SkillSumma
                    let contents = fs::read_to_string(skill_path)?;
                    let dir_name = entry.file_name().to_string_lossy().to_string();
                    let (name, description) = parse_skill_frontmatter(&contents);
+                    // #445: detect name/dir mismatch
+                    if let Some(ref frontmatter_name) = name {
+                        if frontmatter_name != &dir_name {
+                            metadata_drift.push(SkillMetadataDrift {
+                                dir_name: dir_name.clone(),
+                                frontmatter_name: frontmatter_name.clone(),
+                                path: entry.path(),
+                            });
+                        }
+                    }
                    root_skills.push(SkillSummary {
                        name: name.unwrap_or_else(|| dir_name.clone()),
                        description,
@@ -4105,7 +4146,10 @@ fn load_skills_from_roots(roots: &[SkillRoot]) -> std::io::Result<Vec<SkillSumma
        }
    }

-    Ok(skills)
+    Ok(SkillCollection {
+        skills,
+        metadata_drift,
+    })
 }

 fn parse_toml_string(contents: &str, key: &str) -> Option<String> {
@@ -4431,21 +4475,32 @@ fn render_skills_report(skills: &[SkillSummary]) -> String {
    lines.join("\n").trim_end().to_string()
 }

-fn render_skills_report_json_with_action(skills: &[SkillSummary], action: &str) -> Value {
+fn render_skills_report_json_with_action(collection: &SkillCollection, action: &str) -> Value {
+    let skills = &collection.skills;
+    let metadata_drift = &collection.metadata_drift;
    let active = skills
        .iter()
        .filter(|skill| skill.shadowed_by.is_none())
        .count();
+    let has_drift = !metadata_drift.is_empty();
+    let status = if has_drift { "degraded" } else { "ok" };
    json!({
        "kind": "skills",
-        "status": "ok",
+        "status": status,
        "action": action,
+        "valid_count": skills.len(),
+        "metadata_drift_count": metadata_drift.len(),
        "summary": {
            "total": skills.len(),
            "active": active,
            "shadowed": skills.len().saturating_sub(active),
        },
        "skills": skills.iter().map(skill_summary_json).collect::<Vec<_>>(),
+        "metadata_drift": metadata_drift.iter().map(|drift| json!({
+            "dir_name": &drift.dir_name,
+            "frontmatter_name": &drift.frontmatter_name,
+            "path": drift.path.display().to_string(),
+        })).collect::<Vec<_>>(),
    })
 }

@@ -6431,7 +6486,10 @@ mod tests {
            },
        ];
        let report = super::render_skills_report_json_with_action(
-            &load_skills_from_roots(&roots).expect("skills should load"),
+            &super::SkillCollection {
+                skills: load_skills_from_roots(&roots).expect("skills should load"),
+                metadata_drift: Vec::new(),
+            },
            "list",
        );
        assert_eq!(report["kind"], "skills");