AI’s understanding and reasoning skills can’t be assessed by current tests

Assessing whether large language models — including the one that powers ChatGPT — have humanlike cognitive abilities will require better tests.

Jul 10, 2024 - 21:30

0 56

AI’s understanding and reasoning skills can’t be assessed by current tests

Citations

E. Arakelyan, Z. Liu and I. Augenstein. Semantic sensitivities and inconsistent predictions: Measuring the fragility of NLI models. Court situations of the 18th Convention of the European Chapter of the Association for Computational Linguistics. March 2024.

N. Alzahrani et al. When benchmarks are ambitions: Revealing the sensitivity of big language mannequin leaderboards. arxiv:2402.01781. February 1,2024.

N. Dziri et al. Faith and fate: Limits of transformers on compositionality. Advances in Neural Archives Processing Systems 36. February 2024.

P. West et al. The generative AI paradox: “What it is going to create, it is going to now not apprehend.” Worldwide Convention on Discovering out Representations. January 16, 2024.

C. Deng et al. Investigating statistics defect in lowering-edge benchmarks for big language models. arXiv:2311.09783. November 16, 2023.

R. Burnell et al. Rethink reporting of assessment penalties in AI. Science. Vol. 380, April 14, 2023, p. 136. doi:10.1126/science.adf6369.

E. Davis. Benchmarks for automated commonsense reasoning: A survey. arXiv:2302.04752. February 9, 2023.

Y. Elazar et al. Lower to get back to sq. one: Artifact detection, practise and commonsense disentanglement in the Winograd Schema. Court situations of the 2021 Convention on Empirical Systems in Herbal Language Processing. November 2021. doi: 10.18653/v1/2021.emnlp-predominant.819.

D. Hendrycks et al. Measuring Big Multitask Language Expertise. Worldwide Convention on Discovering out Representations. January 12, 2021.

P. Trichelair et al. How sensible are fashioned-tour reasoning projects: A case-review on the Winograd Schema Mission and SWAG. Court situations of the 2019 Convention on Empirical Systems in Herbal Language Processing and the ninth Worldwide Joint Convention on Herbal Language Processing (EMNLP-IJCNLP). November 2019. doi: 10.18653/v1/D19-1335.

Ananya is a contract science creator, journalist and translator, with a investigation history in robotics. She covers all issues algorithms, robots, animals, oceans, urban and the persons worried in these fields.