As someone who wishes LLMs could code better, no, we are nowhere near there yet for anything non-trivial. The models vary, but when the number of distinct responsibilities hits ~20, the models start generating very poor logic. There's a reason why all the codegen tools have some central "toolkit" like Supabase. We are nowhere near the point where LLMs can take over all coding. I'd say for web dev tasks, they're getting close to 80% of the way there, but that last 20% is the hard parts and it will take 4x longer than the 80% easy-kill parts to take over. If you go down a few layers to performance-critical code, they're well under 30% of the way there. Another reason why this will not happen by 2026 is that coding is not the hardest part of software, figuring out what humans really want is. Right now, LLMs can do a good amount of the low-value work that a good template or snippet library would cover. They're also decent at pinpointing bugs because they're very efficient spaghetti throwing machines, throwing entire boxes of noodles at the wall faster than humans. However, they're not very good at fixing bugs without causing regressions. Want to see a model fall on its face? Ask any codegen tool to write you an inference engine for the H200 in PTX, you're not going to get very far. It output something that looks like PTX code, but it'll be, well, some novel form of pseudo-code that doesn't compile and is fundamentally broken. (责任编辑:) |