
Stanford researchers determined that many AI models fail in real-world scenarios because they learn "spurious correlations" from training data, a flaw that current benchmarks often overlook due to the "accuracy on the line" phenomenon. The researchers highlight three recommendations for reliable and safe AI, especially in safety-critical applications.