On September 18, during the AI Security Governance Forum of the 2025 National Cybersecurity Awareness Week held in Kunming, the Chinese Internet Basic Corpus 3.0 was officially released. This new version has reached an impressive 120GB, aiming to provide reliable data support for large model training and further development of artificial intelligence.

The release of Chinese Internet Basic Corpus 3.0 is the result of collaborative efforts by the China Cybersecurity Association and the National Internet Emergency Center, under the guidance of the Central Cyberspace Administration. The development and construction of this corpus have benefited from close cooperation between enterprises, universities, and research institutions, making full use of the corpus co-construction and sharing mechanism established by the AI Security Governance Committee of the Cybersecurity Association. Compared to the previous two versions, version 3.0 has expanded the source range and further improved the quality of the data.

Code Internet (2)

Image source note: The image was generated by AI, and the image licensing service provider is Midjourney.

In terms of data processing, corpus 3.0 has undergone a series of meticulous processing measures, including strict source screening, content filtering, and data deduplication. These measures ensure that the released data is more credible, helping to filter out illegal and harmful information, and providing a healthier environment for the research and application of artificial intelligence.

Users can log on to the website of the China Cybersecurity Association, click on the "Chinese Internet Corpus Resource Platform" link, register and verify to download the relevant corpus. The person in charge stated that the release of Chinese Internet Basic Corpus 3.0 marks the joint efforts and achievements of all sectors towards high-quality Chinese corpus. In the future, efforts will continue to be strengthened in the construction of Chinese Internet basic corpus to support the innovation and industrial development of artificial intelligence technology.

The release of Chinese Internet Basic Corpus 3.0 undoubtedly injects new momentum into the development of artificial intelligence and provides a more solid foundation for related fields of research.