Discussion about this post

User's avatar
Rainbow Roxy's avatar

Thanks for writing this, it clarifies a lot. Super insightful to use vage prompts! I'm curious, how much did the 'standardized Node.js security test project' itself influence the models' ouputs?

Expand full comment
Larry's avatar

What would be really nice to see is the same sets of three test cases on the frontier AND budget models at once. You did this for the newest frontier ones recently and now this one. That would give us some real overall comparisons. Always looking for the correct balance of cost and quality. Thanks!

Expand full comment

No posts