R
Link preview
Competence Gate: gating tool-use on a small model's internal confidence signal instead of its verbalised one — Qwen3.5-4B, open weights [P]
I made a 10MB LoRA adapter for Qwen3.5-4B plus a small orchestration layer. It decides, per query, whether to answer directly, search the web, or retrieve from your own local documents and it refuses to make things up when it can't verify an answer. It runs locally (Apple Silicon / MLX, with a GGUF build for llama.cpp/Ollama). Basically small instruct models are poor at telling users how confident they really are. They can't verbalise it and tend to say they are confident for everyhting. In my past research I tested seven 3-9b models and they all hit a confidence ceiling. But the information is there in the internal activations. The adapter reads the internal signal directly and gates tool use on it. The main elements are that: - it catches its own errors better than the base model's tool calling (d′ improvement of 0.46 (95% CI [0.01, 0.89])). Of the cases the gate flagged that the base model didn't, 87% were genuinely wrong answers. - it is less likely to leak your private queries to public search. A two-signal version routes personal information related questions such as "what did my discharge summary say" to a local retriever instead of a websearch. It cut the rate of private questions sent to public search from 22% to 10% (reduction 0.12, 95% CI [0.02, 0.22]). This is useful for those who are using the LLM for confidential docs. - every answer is traceable. When it retrieves, it cites the specific passage (report.md ¶2), verifies the answer is actually in that passage, and shows a confidence band. Worst case, it says "I couldn't verify that". It is built to say "I don't know," instead of lie. limitations: - Privacy result is n=60; the retrieval/competence dissociation is n=126 hand-authored items. Screened and CI'd, but small. - GGUF reproduces the MLX gate's decisions at --lora-scaled ...:8 (found by sweep — scale 1 does nothing; effective scale ≈ the training scale). Agreement 0.83 on a 24-item probe; disagreements are all conservative-direction (GGUF answers a couple of borderline items MLX would look up), and knowns never false-fire. Faithful on the safety-critical directions, marginally more conservative at the margin. - Serve-time confidence is coarse (grounded / declined / answered) — the distilled gate reads nothing at inference, so finer bands need probe access (offline). - Inherits Qwen3.5-4B's knowledge and biases. The gate governs when to trust the model, not what it knows. The approach isn't Qwen-specific — I started on SmolLM3-3B, and it should extend to other models and larger sizes. Repo (weights + code + model card): https://huggingface.co/synthiumjp/competence-gate-qwen3.5-4b Apache-2.0. It's an open research release. I hope people might find some use for it. Methodology and papers are cited in the model card. Genuinely interested in critique, it's screened work, so if there are any issues it be great to know. submitted by /u/Synthium- [link] [Kommentare] reddit.com · reddit.com ↗
I made a 10MB LoRA adapter for Qwen3.5-4B plus a small orchestration layer. It decides, per query, whether to answer directly, search the web, or retrieve from your own local documents and it refuses to make things up when it can't verify an answer. It runs locally (Apple Silicon / MLX, with a GGUF build for llama.cpp/Ollama). Basically small instruct models are poor at telling users how confident they really are. They can't verbalise it and tend to say they are confident for everyhting. In my past research I tested seven 3-9b models and they all hit a confidence ceiling. But the information is there in the internal activations. The adapter reads the internal signal directly and gates tool use on it. The main elements are that: - it catches its own errors better than the base model's tool calling (d′ improvement of 0.46 (95% CI [0.01, 0.89])). Of the cases the gate flagged that the base model didn't, 87% were genuinely wrong answers. - it is less likely to leak your private queries to public search. A two-signal version routes personal information related questions such as "what did my discharge summary say" to a local retriever instead of a websearch. It cut the rate of private questions sent to public search from 22% to 10% (reduction 0.12, 95% CI [0.02, 0.22]). This is useful for those who are using the LLM for confidential docs. - every answer is traceable. When it retrieves, it cites the specific passage (report.md ¶2), verifies the answer is actually in that passage, and shows a confidence band. Worst case, it says "I couldn't verify that". It is built to say "I don't know," instead of lie. limitations: - Privacy result is n=60; the retrieval/competence dissociation is n=126 hand-authored items. Screened and CI'd, but small. - GGUF reproduces the MLX gate's decisions at --lora-scaled ...:8 (found by sweep — scale 1 does nothing; effective scale ≈ the training scale). Agreement 0.83 on a 24-item probe; disagreements are all conservative-direction (GGUF answers a couple of borderline items MLX would look up), and knowns never false-fire. Faithful on the safety-critical directions, marginally more conservative at the margin. - Serve-time confidence is coarse (grounded / declined / answered) — the distilled gate reads nothing at inference, so finer bands need probe access (offline). - Inherits Qwen3.5-4B's knowledge and biases. The gate governs when to trust the model, not what it knows. The approach isn't Qwen-specific — I started on SmolLM3-3B, and it should extend to other models and larger sizes. Repo (weights + code + model card): https://huggingface.co/synthiumjp/competence-gate-qwen3.5-4b Apache-2.0. It's an open research release. I hope people might find some use for it. Methodology and papers are cited in the model card. Genuinely interested in critique, it's screened work, so if there are any issues it be great to know. submitted by /u/Synthium- [link] [Kommentare]
Comments