The individual model sums are mathematically correct, but the total possible points denominator is wrong.
GLM-5:
The sum (12 + 3 + 7.5 + 2) is exactly 24.5.
MiniMax M2.5:
The sum (12 + 6 + 8 + 2) is exactly 28.
Total Possible Score:
The sum of the max category points (12 + 6 + 8 + 2) is 28, not 30. This means MiniMax M2.5 actually achieved a perfect score, and the totals should be formatted as 24.5/28 and 28/28. It is critical to double-check baseline metrics when evaluating models for hallucinations and bug-fixing capabilities.
How does the Kimi K2.5 perform? Looking forward to seeing related review articles.
Horse Cock
GLM-5 is better than MiniMax in most of the cases
is there a github link somewhere?
GLM-5 outperforms MiniMax-2.5 by 4.47%, not 2%.
Correct results:
Table 1 (IMG_0400.jpeg)
GLM-5: 34/35 = 97.14%
MiniMax M2.5: 30/35 = 85.71%
Table 2 (Test 1 Scoring)
GLM-5: 24.5/28 = 87.50%
MiniMax M2.5: 28/28 = 100.00%
Table 3 (IMG_0402.png)
GLM-5: 35/35 = 100.00%
MiniMax M2.5: 31.5/35 = 90.00%
Overall Cumulative Score
Total Possible Points: 98
GLM-5: 93.5/98 = 95.41%
MiniMax M2.5: 89.5/98 = 91.33%
Test 1 - fatal flaw:
The individual model sums are mathematically correct, but the total possible points denominator is wrong.
GLM-5:
The sum (12 + 3 + 7.5 + 2) is exactly 24.5.
MiniMax M2.5:
The sum (12 + 6 + 8 + 2) is exactly 28.
Total Possible Score:
The sum of the max category points (12 + 6 + 8 + 2) is 28, not 30. This means MiniMax M2.5 actually achieved a perfect score, and the totals should be formatted as 24.5/28 and 28/28. It is critical to double-check baseline metrics when evaluating models for hallucinations and bug-fixing capabilities.
Test 2 has fatal flaws in final score!
Both totals in the table are incorrect.
GLM-5:
The sum of the individual scores (5, 6, 6, 4, 4, 4, 3, 2) is 34, not 31.
MiniMax M2.5:
The sum of the individual scores (4, 6, 5, 3, 3, 3, 4, 2) is 30, not 29.