Anti-piracy Organization Takes Down AI Training Dataset 'Books3' Used by Meta's Large Models


The New York Times and the Daily News encountered an unexpected twist in their copyright lawsuit: an OpenAI engineer inadvertently deleted virtual machine search data that could have been key evidence, adding a dramatic turn to this high-profile legal dispute. According to a letter submitted to the U.S. District Court for the Southern District of New York on Wednesday night, lawyers and technical experts for the two media companies had previously invested over 150 hours searching OpenAI's AI training dataset. However, on November 14, an OpenAI engineer accidentally deleted data stored on the virtual machine.
LAION launched Re-LAION-5B, the world's first AI training dataset that fully removes links to CSAM, aimed at addressing the issue of Child Sexual Abuse Material (CSAM). This dataset has been significantly improved over LAION-5B and is mainly divided into two versions: Re-LAION-5B Research and Research-Safe. A total of 2,236 CSAM links have been removed, including 1,008 from child protection organizations' lists. The dataset contains 5.5 billion pairs of text and images, designed to help
Recently, the 'Diting' seismic wave large model, jointly developed by the National Supercomputing Center in Chengdu, the Institute of Geophysics of the China Earthquake Administration, and Tsinghua University, was officially released in Chengdu, Sichuan. This model is the first seismic wave large model in the country to reach 100 million parameters, marking a significant breakthrough in the integration of seismology research and artificial intelligence technology in China.
The top global domain AI.com was sold for about $70 million, setting a new record. The seller is Malaysian blockchain investor Arsyan Ismail, who registered the domain in 1993 by coincidence due to his name's abbreviation. The transaction highlights the re-evaluation of the value of AI assets and has drawn industry attention.
Actress Liu Meihan shared a funny voice acting story. To confirm the pronunciation of the character 'fang' in 'Zhu Mi Fang', she tested five mainstream AI tools, and the results varied: Baidu pronounced it as 'fang', while DeepSeek, Tencent Yuanbao, and Alibaba Qianwen all pronounced it as 'fang', highlighting the inconsistency of AI in identifying homophones.