According to reports, developer JeecgBoot conducted in-depth testing on local large model integration under the Mac Studio M4Max environment with Claude Code. The results showed that using a community-modified distillation model achieved an impressive 5-6 times improvement in generation speed compared to the official version.

image.png

Key Test Insight: Choosing the Right Model Matters More Than Optimization

In this test, the developer abandoned the suboptimal official version and instead used the community-modified model gemma-4-26b-a4b-it-claude-opus-heretic-ara, achieving impressive performance results:

  • Extreme Speed: The generation speed reached up to 78 tok/s, significantly higher than the original version's十几 token.

  • Sparse Activation: It uses the A4B (Active4B) MoE architecture, with a total of 26B parameters but only about 4B parameters activated during each inference, achieving "small parameter computing power, large parameter intelligence."

  • Long Context: It supports 256K context, fully compatible with Anthropic API format, enabling zero-configuration integration.

image.png

Performance Analysis: Agentic Workflow Is a Double-Edged Sword

The test shows that although the model generates very quickly, it still takes approximately 1.5 minutes to complete specific tasks, such as generating teacher table code.

  • Bottleneck Identification: The main time consumption is concentrated on the multi-step Agentic decision chain of Claude Code. The system performs multiple rounds of Thought (thinking) and Skill loading before execution, leading to prompt token expansion.

  • Value Trade-off: This multi-step decision-making is highly valuable for code generation and modification tasks, ensuring path compliance and logical closure; however, for simple knowledge questions, it is recommended to use LM Studio directly for faster results.

image.png

Quality Assessment: JeecgBoot Teacher Table Output

In the test targeting the JeecgBoot framework, this combination demonstrated a high level of practical capability:

  • Standardization: The SQL path automatically conforms to Flyway standards, and date generation is accurate.

  • Technology Stack: Vue3 uses script setup + TS writing, fully compliant with modern development standards.

  • Completeness: It generated a complete set of skeleton including Controller, Service, and Mapper.

  • Limitations: Complex method bodies still require manual supplementation, and key logic should be manually reviewed.

Strategic Recommendations: A Dual-Model "High-Low Configuration" Combination

Based on the test data, the developer proposed an optimal strategy that balances privacy, cost, and quality:

  1. Local Modified Model (80% Scenarios): Handles daily CRUD generation, code explanation, and privacy-sensitive internal projects, enjoying zero-cost and data security within the internal network.

  2. Cloud Official API (20% Scenarios): Tackles complex architectural design and core security modules, ensuring production-level quality.

Conclusion: Opening a New Era of Local AI Development

With the popularity of powerful hardware like M4Max and the support of Q4_K_XL quantization technology, running high-performance Agent locally is no longer science fiction. The local implementation of QwenPaw and Claude Code provides enterprise developers with unprecedented productivity tools while ensuring data privacy.