ZhiYuan Research Institute Jointly Builds Chinese Internet Corpus CCI to Provide Resources for Big Data and Artificial Intelligence Industries


At the 2024 Beijing Cultural Forum, the Beijing Academy of Artificial Intelligence (BAAI) officially announced the release of the next-generation Chinese Internet corpus CCI3.0 (Chinese Corpora Internet), further promoting data co-construction and sharing. CCI3.0 includes a 1000GB dataset and a 498GB high-quality subset CCI3.0-HQ, marking another important update following the initial open-source release of CCI1.0 in November 2023 and the release of CCI2.0 in April 2024.
Sunshine Intelligent Technology Co., Ltd. has a registered capital of 700 million yuan. Its scope of business includes big data services, internet security services, and the development of artificial intelligence application software. The company is wholly owned by Sunshine Insurance. The intelligent technology company involves multiple AI businesses, showcasing potential for future development.
Big data and large models are areas of focus for the Worth Buying company, which is developing the 'Worth Buying Consumption Content Large Model' based on a general large model. The company aims to enhance the efficiency of platform search and content distribution through big data. Their product database has recorded nearly 220,000 brands and 11.23 million aggregated products. The large model will be applied to various products based on the 'Worth Buying Consumption Content Large Model.'
OceanBase open-sources AI database seekdb, enabling quick knowledge base and agent app setup in 3 lines of code. It handles billion-scale multimodal data retrieval with unified hybrid search, integrates AI inference, and supports 30+ frameworks like Hugging Face.....
Meta's chief AI scientist Yann LeCun is leaving to start a venture focused on developing a 'world model' AI, seeking investment for goal-driven AI commercialization, challenging Meta's large language model strategy.....