Id be curious what the result would be if you ran MiniMax 2-3x and each time you asked it to review the previous work and check again to see if it missed anything the previous time. This has become a habit for me since all models seem to catch stuff a 2nd and even 3rd time around.
Id be curious what the result would be if you ran MiniMax 2-3x and each time you asked it to review the previous work and check again to see if it missed anything the previous time. This has become a habit for me since all models seem to catch stuff a 2nd and even 3rd time around.
This kind of research is extremely valuable...keep it up.
Is the benchmark codebase publicly available?