At a time when AI giants are feverishly adding resources and chasing peak performance scores, Elon Musk's xAI has taken a different approach, aiming to solve the most frustrating issue in the AI field: "seriously making up nonsense." Today, xAI officially launched Grok4.20Beta. Although it still lags behind the top tier in absolute intelligence scores, it has set a new industry record in the key metric of "truthfulness."

image.png

According to the latest evaluation by Artificial Analysis, Grok4.20 scored 48 on the intelligence index in reasoning mode. Although it lags behind and (both scoring 57), its performance in fact reliability was extremely impressive:

  • Low hallucination rate: In the AA Omniscience test, Grok4.20 achieved a 78% "non-hallucination rate", setting a new record.

  • Knowing what you know: When faced with questions it cannot answer, the model no longer tends to fabricate false facts, but instead more accurately admits "I don't know." This "honesty" is crucial for rigorous office and research environments.

Technical Architecture: A Three-in-One API Matrix

To meet the needs of different levels, xAI has launched three API variants:

Reasoning Mode: Sacrifices speed for deep logical thinking, which is the core of breaking the hallucination record this time.

Standard Mode: Focuses on fast response and routine interaction.

Multi-agent Mode: Supports multiple AI instances working together to handle complex tasks.

Market Strategy: More Content, No Extra Cost

In addition to its unique performance, Grok4.20 also has an aggressive commercial strategy:

  • Large context: Supports a context window of up to 2 million tokens, allowing it to absorb an entire book or large codebase at once.

  • Price advantage: It is priced between $2 to $6 per million tokens, not only cheaper than the previous version Grok4, but also highly competitive among current Western mainstream models.

The release of Grok4.20 reflects xAI's strategic shift—no longer obsessing over the total score race toward AGI, but precisely targeting the pain point of "enterprise-level reliability." As the evaluation institution stated, if other models are striving to become "omniscient prophets," Grok4.20 is striving to become "a helper who never lies."

For users with extremely high requirements for data accuracy, Grok4.20 may become a third major option, in addition to OpenAI and Google.