Google recently announced the release of Magika 1.0, the latest stable version of its AI-based file type detection system. This release marks a significant improvement in performance and security, as the core engine has been fully migrated to the Rust language. Since its open-source release last year, Magika has been widely used in the open-source community, with more than 1 million downloads per month.

The architecture of the new version of Magika has been completely restructured, significantly improving processing speed and memory safety. Google stated that this tool can identify hundreds of files per second on a single-core processor, and can scale to thousands of files per second using multi-core CPUs. Magika 1.0 uses ONNX Runtime for model inference and the Tokio framework for asynchronous processing, ensuring efficient operation.
In terms of file format support, Magika 1.0's detection capability has expanded to over 200 file formats, nearly double the initial version. The newly added file types include Jupyter Notebooks, Numpy, PyTorch in data science and machine learning, and Swift, Kotlin, TypeScript in modern programming and web development. In addition, it supports DevOps-related files and various database and graphics format files, such as SQLite and AutoCAD.
Magika 1.0 not only improves the ability to identify similar file formats but also enhances the distinction between different programming language files, such as C and C++, JavaScript and TypeScript. Google faced many challenges in the technical implementation, including the massive size of training data and the scarcity of samples for some file types. To address this, Google developed its own dataset library SedPack and used the generative AI tool Gemini to create high-quality synthetic training data, thereby improving the model's generalization ability.
Notably, Magika has also updated its Python and TypeScript modules, making it easier for developers to integrate. Users can install Magika with simple commands on different operating systems, and Google encourages developers to participate in the project to continue optimizing and expanding the tool's functionality.
Key Points:
🌟 Magika 1.0 is rebuilt using the Rust language, bringing significant improvements in performance and security.
📂 Supports over 200 file formats, adding various data science and programming language types.
⚙️ Simplifies the integration process for developers and encourages community participation in project optimization.
