Show HN: Narada – Open-source secrets classification model

5 points by sanketsaurav 7 hours ago

Hey HN! We're the team behind Autofix Bot (YC W20's DeepSource)[1]. We're open-sourcing Narada (https://huggingface.co/deepsource/Narada-3.2-3B-v1), a fine-tuned Llama3.2-3B-Instruct model that dramatically reduces false positives in secrets detection tools. The model achieves 97% precision with 96% recall on our evaluation set. It's fast enough for CI/CD (3B parameters), works with any regex-based tool, and is MIT-licensed.

Traditional regex-based secrets scanners (Gitleaks, TruffleHog, detect-secrets) face a fundamental tradeoff: crank up sensitivity and drown in false positives flagging things like "YOUR_API_KEY_HERE", or tune it down and miss real credentials. We kept hearing from security teams that they couldn't trust their scanning tools because of the noise – developers would just ignore the alerts.

Regex is great at fast pattern matching, but terrible at understanding context. So instead of trying to make regex smarter, we built a hybrid system: regex does the initial high-recall sweep, then a fine-tuned 3B model filters out false positives by actually understanding the code context.

Technical approach: - Started with teacher-student architecture using DeepSeek R1 as teacher - Curated ~8K diverse secrets from Samsung's CredData dataset, relabeled for consistency - Generated synthetic edge cases using Gemini 2.5 Pro and Claude Sonnet 4 - Fine-tuned on ~900 examples with deterministic outputs (not chain-of-thought)

Integration is straightforward – run your existing regex tool, feed candidates to Narada with ±20 lines of context, get structured JSON output with true/false positive classification and reasoning.

We built this as part of Autofix Bot's secrets detection agent, and it outperformed static-only tools significantly in our benchmarks [2]. Figured the security community would benefit from having this available as an open-source building block. Would love to hear your feedback and learn what other edge cases you encounter.

[1] https://autofix.bot

[2] https://autofix.bot/benchmarks#benchmarks-secrets-detection

[3] https://autofix.bot/news/narada-secrets-detection-classifica...