Anti-piracy Organization Takes Down AI Training Dataset 'Books3' Used by Meta's Large Models

Recently, an anti-piracy organization demanded the removal of the AI training dataset "Books3" from the online piracy repository The Eye. This dataset, comprising 37GB of text, is used to train artificial intelligence models. The organization stated that AI presents new challenges to copyright and calls for enhanced regulation and standardization. Although the dataset has been taken down, new download links have been issued by the publisher. Major tech companies like Meta have also utilized this dataset. The anti-piracy organization plans to continue targeting websites that host this dataset.

AI Data Scandal: OpenAI Accidentally Deletes Evidence, Media Giants Sue for Copyright Infringement

The New York Times and the Daily News encountered an unexpected twist in their copyright lawsuit: an OpenAI engineer inadvertently deleted virtual machine search data that could have been key evidence, adding a dramatic turn to this high-profile legal dispute. According to a letter submitted to the U.S. District Court for the Southern District of New York on Wednesday night, lawyers and technical experts for the two media companies had previously invested over 150 hours searching OpenAI's AI training dataset. However, on November 14, an OpenAI engineer accidentally deleted data stored on the virtual machine.

LAION Releases New AI Dataset Re-LAION-5B, Completely Removes Links to Child Sexual Abuse Material

LAION launched Re-LAION-5B, the world's first AI training dataset that fully removes links to CSAM, aimed at addressing the issue of Child Sexual Abuse Material (CSAM). This dataset has been significantly improved over LAION-5B and is mainly divided into two versions: Re-LAION-5B Research and Research-Safe. A total of 2,236 CSAM links have been removed, including 1,008 from child protection organizations' lists. The dataset contains 5.5 billion pairs of text and images, designed to help

The First 100 Million Parameter Seismic Wave Large Model 'Diting' Released in Chengdu

Recently, the 'Diting' seismic wave large model, jointly developed by the National Supercomputing Center in Chengdu, the Institute of Geophysics of the China Earthquake Administration, and Tsinghua University, was officially released in Chengdu, Sichuan. This model is the first seismic wave large model in the country to reach 100 million parameters, marking a significant breakthrough in the integration of seismology research and artificial intelligence technology in China.

Lingguang App Lingguang Circle Community Refresh: Launch of Hot List, Follow Features, PC Support for Importing Documents and Audio-Video Materials

The Lingguang Circle community under Ant Group's Lingguang App has been upgraded with hot list, curated, and follow features. Its PC version now supports multimodal file uploads, allowing users to generate smart Q&A or AI apps from local content, greatly enhancing app discovery and creation experience.....

OpenAI Launches AI Safety Flywheel: How GPT-Red Redefines Model Robustness

To counter security weaknesses from AI's deep integration with browsers and files, OpenAI introduced GPT-Red, an automated red team model. Using self-play, it lowers direct prompt injection attack failure rate to 0.05%, outperforming manual testing in efficiency and coverage. This paves a new way for AI self-improvement and security strengthening.....