Show HN: Duende: Web UX for guiding Gemini as it improves your source code

2 points by afc 18 hours ago

I wrote a simple web UX in Python/JavaScript that spawns a conversation with Google Gemini with MCP commands that let it work on a specific coding task that you specify: http://github.com/alefore/duende

The UX lets you observe the conversation and provide guidance (e.g. "don't implement Foo through Bar, that's suboptimal; instead …").

It supports a `--review` mode, where once the main conversation says "I'm done with the task", various "evaluation" conversations are spawned, each focusing on reviewing the change from a very specific angle (e.g. "does it introduce useless comments?"). In the future, I'm considering adding other workflows.

I've used it mostly to develop itself (I started with a super rudimentary manual implementation and then mostly used Duende to extend its own implementation) as well as (with moderate success) to add a few new features to [my C++ text editor](http://github.com/alefore/edge).

It's been a lot of fun. I'm still developing my intuitions for what works and what doesn't, but I've already had plenty of experiences where I've been WOW'ed by what LLMs can already accomplish (as well as, to be honest, plenty of very disappointing cases where it struggles significantly on tasks that I expected would have been trivial). It's a learning experience, seeing things like how to avoid hallucinations; developing an intuition for how much to break down large tasks into smaller ones; or knowing when to abort a conversation and restart it with an improved prompt (vs continue to steer it in the right direction).

I think investing up-front in setting up a good context (e.g., good validation logic, useful constant contexts, a good set of "review evaluators" that can push-back against sloppy code) can go a long way to increase the odds of success.

It has changed my perception on the applicability of AI for developing software. While I've reviewed all the outputs (and often rewrite parts of them manually), I'm already incorporating it into (some parts) of my development life-cycle.

I have many ideas for improvements, but figured I'd share this early and ask for feedback. Hopefully others find it interesting; would love to hear your thoughts.