Saturday, December 21, 2024

Can AI Crack New York Times Connections Puzzle? Experimented

Share

The New York Times daily Connections puzzle tests players’ ability to identify themes among words that at first glance appear unrelated. This seems like a straightforward exercise, yet it calls for critical thinking and the capacity to recognize abstract links. NYU Tandon School of Engineering researchers questioned whether artificial intelligence (AI) models could handle this linguistic challenge. Could AI crack the New York Times connections puzzle, or would the nuances of human language continue to be a challenge?

What is New York Times Connections Puzzle

Players are given 16 words to sort into four thematically related groups of four in the NYT Connections puzzle. Difficulty levels range from “simple” connections based on straightforward definitions to “tricky” ones that demand abstract thought processes. With only five attempts, players must decipher the underlying connections to successfully solve the puzzle.

The NYU Tandon Game Innovation Lab’s director, Professor Julian Togelius, assembled a team to see if contemporary Natural Language Processing (NLP) systems could solve these linguistic riddles. Their research clarifies the strengths and weaknesses of the available AI models, and it is available on the arXiv preprint service. It will also be presented at the IEEE 2024 Conference on Games.

Two AI Approaches

The researchers explored two distinct AI approaches:

  • Large Language Models (LLMs): The first approach utilized GPT-3.5 and the recently released GPT-4, powerful LLMs from OpenAI. These models excel at understanding and generating human-like language, making them promising candidates for handling the NYT Connections puzzle.
  • Sentence Embedding Models: The second approach employed sentence embedding models like BERT, RoBERTa, MPNet, and MiniLM. These models encode semantic information as vector representations but lack the full language comprehension and generation capabilities of LLMs.

Results of the Experiment

Even though all of the tested AI systems were able to solve some of the Connections puzzles, the assignment was difficult in general. The advanced model, GPT-4, outperformed both its predecessor, GPT-3.5, and sentence embedding models, solving about 29% of the puzzles. However, persistent proficiency with the puzzle remained difficult. It’s interesting to note that, in terms of difficulty, the AI models’ performance reflected human behaviour. Similar to people, the models discovered that “simple” challenges were simpler to answer than “tricky” ones that required more abstract thought. 

“Examining where LLMs falter on the Connections puzzle can reveal limitations in how they process semantic information,” explains Dr Graham Todd, the study’s lead author and a PhD student at the Game Innovation Lab. “These insights can guide further development of AI models with improved reasoning and critical thinking capabilities.”

The research team found that directly asking GPT-4 to solve the puzzles in a step-by-step manner improved its performance considerably. With this “chain-of-thought” prompting strategy, the model was able to solve slightly more than 39% of the puzzles. “Our findings align with previous research that demonstrates the effectiveness of ‘chain-of-thought’ prompting in encouraging a more structured thought process within language models,” says Timothy Merino, a PhD student at the Game Innovation Lab and co-author of the study. “Essentially, by asking the models to explain their reasoning as they work through the task, we can improve their performance.”

Check! 5 Best Roblox Games of All Time!

The researchers see scenarios where models like GPT-4 might work with humans in addition to measuring AI capabilities. One such approach is to offer support in the creative process of creating new word puzzles. This cooperative strategy may extend the bounds of machine learning’s conceptual representation and contextual inference capabilities. The research team’s experiments utilized a dataset of 250 puzzles from an online archive, encompassing daily puzzles from June 12, 2023, to February 16, 2024. This study adds to Professor Togelius’ ongoing investigation into the intersection of AI and game design, as explored in his book, “Playing Smart: On Games, Intelligence, and Artificial Intelligence” (2019).

The ability to solve The New York Times Connections puzzle offers a window into an AI model’s reasoning and critical thinking skills. While AI hasn’t yet cracked the code completely, this research paves the way for further exploration in the fascinating field of language processing and AI development.

Read more

Local News