8 Comments
User's avatar
Kehao Chen's avatar

How does the Kimi K2.5 perform? Looking forward to seeing related review articles.

Orlando Ascanio's avatar

GLM-5 is better than MiniMax in most of the cases

donkey🦛's avatar

is there a github link somewhere?

Peter Gabriel's avatar

GLM-5 outperforms MiniMax-2.5 by 4.47%, not 2%.

Peter Gabriel's avatar

Correct results:

Table 1 (IMG_0400.jpeg)

GLM-5: 34/35 = 97.14%

MiniMax M2.5: 30/35 = 85.71%

Table 2 (Test 1 Scoring)

GLM-5: 24.5/28 = 87.50%

MiniMax M2.5: 28/28 = 100.00%

Table 3 (IMG_0402.png)

GLM-5: 35/35 = 100.00%

MiniMax M2.5: 31.5/35 = 90.00%

Overall Cumulative Score

Total Possible Points: 98

GLM-5: 93.5/98 = 95.41%

MiniMax M2.5: 89.5/98 = 91.33%

Peter Gabriel's avatar

Test 1 - fatal flaw:

The individual model sums are mathematically correct, but the total possible points denominator is wrong.

GLM-5:

The sum (12 + 3 + 7.5 + 2) is exactly 24.5.

MiniMax M2.5:

The sum (12 + 6 + 8 + 2) is exactly 28.

Total Possible Score:

The sum of the max category points (12 + 6 + 8 + 2) is 28, not 30. This means MiniMax M2.5 actually achieved a perfect score, and the totals should be formatted as 24.5/28 and 28/28. It is critical to double-check baseline metrics when evaluating models for hallucinations and bug-fixing capabilities.

Peter Gabriel's avatar

Test 2 has fatal flaws in final score!

Both totals in the table are incorrect.

GLM-5:

The sum of the individual scores (5, 6, 6, 4, 4, 4, 3, 2) is 34, not 31.

MiniMax M2.5:

The sum of the individual scores (4, 6, 5, 3, 3, 3, 4, 2) is 30, not 29.