🚀 Gate Square “Gate Fun Token Challenge” is Live!
Create tokens, engage, and earn — including trading fee rebates, graduation bonuses, and a $1,000 prize pool!
Join Now 👉 https://www.gate.com/campaigns/3145
💡 How to Participate:
1️⃣ Create Tokens: One-click token launch in [Square - Post]. Promote, grow your community, and earn rewards.
2️⃣ Engage: Post, like, comment, and share in token community to earn!
📦 Rewards Overview:
Creator Graduation Bonus: 50 GT
Trading Fee Rebate: The more trades, the more you earn
Token Creator Pool: Up to $50 USDT per user + $5 USDT for the first 50 launche
New research suggests that AI capabilities may be exaggerated due to flawed testing.
On November 6th, Jin10 Data reported that a new study suggests that methods used to evaluate artificial intelligence system capabilities often overstate AI performance and lack scientific rigor. Led by the Oxford Internet Institute and involving over thirty researchers from various organizations, the study examined 445 leading AI tests—known as benchmarks—that are commonly used to assess how AI models perform across different subject areas. The research highlights that these foundational tests may lack reliability and questions the validity of many benchmark results.
The study notes that many top benchmarks fail to clearly define their testing objectives, and there is concerning repeated use of existing benchmark data and methods. Additionally, very few employ reliable statistical techniques to compare results across different models. Adam Mahdi, a senior researcher at the Oxford Internet Institute and the lead author of the study, expressed concern that these benchmarks could be misleading. He stated, “When we ask AI models to perform specific tasks, what we often measure are concepts or constructs that are entirely different from the actual goal.” Another principal author also emphasized that even highly reputable benchmarks are frequently trusted blindly, underscoring the need for more thorough scrutiny.