In an experiment conducted by two researchers from the field of education, ChatGPT was asked to solve the "doubling the cube" problem. The experiment delves into the cognitive behavior of artificial intelligence, aiming to determine whether the chatbot would apply direct retrieval of knowledge regarding the geometric solution presented in Plato's lessons around 385 BCE or demonstrate more dynamic behavior by creating its own interpretation of the problem.
"Doubling the cube" is a problem in which a new square is created with double the area of a given square, by making its side lengths equal to the diagonal of the original square. The experiment reflects a contemporary approach to how generative artificial intelligence systems, despite their computational capability, may mirror the learning paths of human students.
The new research examined ChatGPT's mathematical knowledge—as perceived by its users. The researchers wanted to know whether the chatbot would solve Plato's problem using knowledge it already "held," or by adaptively developing its own solutions.
Plato describes Socrates teaching an uneducated boy how to double the area of a cube. Initially, the boy mistakenly suggests doubling the length of each side, but Socrates eventually leads him to understand that the sides of the new square must be the same length as the diagonal of the original square.
This classical pedagogical method raises significant questions: Is mathematical knowledge an innate ability, or does it develop as we interact with problems and seek solutions? This ancient dilemma drives the research on ChatGPT's operations, raising the question of how artificial intelligence, lacking human experience, deals with mathematical concepts.
The researchers posed this problem to ChatGPT-4 (Generative Pre-trained Transformer), the previous generation of OpenAI's language model, which can accept text and image inputs and respond in text. According to the company, the ability to use both types of information allows the system to generate more complex outputs. First, the researchers mimicked Socrates' questions, and then deliberately introduced errors, queries and new variations of the problem.
Like large language models (LLM), ChatGPT is trained on massive text collections and generates responses by predicting word sequences learned during training. The researchers expected it to tackle the mathematical challenge from ancient Greece by repeating its existing "knowledge" of Socrates' famous solution. Instead, it appeared to improvise, and at one point even made a distinctly human error.
The research, published in the International Journal of Mathematical Education in Science and Technology, was conducted by Dr. Nadav Marco from the Hebrew University (Jerusalem) and the David Yellin College of Education in Jerusalem, and Andreas Stylianides, Professor of Mathematics Education at the University of Cambridge. "When we face a new problem, our instinct is often to try things based on our past experience," said Dr. Marco. "In our experiment, it seemed ChatGPT did something similar. Like a person in the process of learning or a learned person, it appeared to come up with its own hypotheses and solutions."
Because ChatGPT is trained on text and not diagrams, it tends to be weaker in the kind of geometric reasoning Socrates used in the doubling the cube problem. Nevertheless, Plato's text is so familiar that the researchers expected the chatbot to recognize their questions and reproduce Socrates' solution.
Surprisingly, it failed to do so. When asked to double the cube, ChatGPT chose an algebraic approach that wasn't known in Plato's time. It then resisted attempts to make it commit the boy's mistake and stubbornly stuck to algebra, even when the researchers complained that its answer wasn't accurate. Only when Marco and Stylianides expressed their disappointment that despite all the training, the popular chatbot couldn't provide the expected and accurate answer, did ChatGPT generate the geometric alternative.
Throughout various iterations of the problem, the researchers adopted techniques reminiscent of Socratic questioning. They didn't provide ChatGPT with direct answers nor lead it to expected conclusions. This forced the AI to tackle the task creatively rather than passively—a decision that ultimately led to a demonstration of student-like learning behavior, exposing the complexities of generative AI's potential and limitations. At one point, ChatGPT made a conspicuous human-style error, further blurring the boundaries between algorithmic calculation and genuine learning.
Despite the difficulty in initially providing the expected classical solution, ChatGPT demonstrated particularly deep knowledge regarding the philosophical context of the problem when asked to discuss Plato's work directly. This finding suggests that while AI calculations may deviate from expected answers, understanding of underlying principles remains intact. The interplay between information retrieval, approximation and innovation becam
e a central theme in understanding AI's limitations and capabilities in mathematical reasoning.
The researchers presented the chatbot with a slightly different version of the geometric problem, asking it to double the area of a rectangle while maintaining its proportions. Although it was aware at this stage of the researchers' preference for geometry, ChatGPT stubbornly stuck to algebra. When forced to double the size of the rectangle, the chatbot incorrectly claimed that since a rectangle's diagonal cannot be used to double its size, a geometric solution is not available.
The claim about the diagonal is correct, but another geometric solution exists. Dr. Marco suggested that the likelihood of this claim coming from the chatbot's knowledge base is virtually zero. Instead, it appears ChatGPT was improvising its answers based on the discussion of the square problem. Finally, Marco and Stylianides asked the chatbot to double the size of a triangle. Here too, it returned again to algebra—but after further guidance arrived at a correct geometric answer.
The researchers emphasize the importance of not over-interpreting these results, since they couldn't scientifically observe ChatGPT's encoding. From the perspective of their digital experience as users, however, what emerged at this surface level was a combination of data retrieval and real-time reasoning.
They compare this behavior to the educational concept of the "Zone of Proximal Development" (ZPD)—the gap between what the learner already knows and what they may eventually know with support and guidance. Lev Vygotsky, a Soviet-Jewish developmental psychologist, defined this as the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers. The idea is that people learn best when working with others, and through such collaboration with more skilled individuals, students learn and internalize new concepts, psychological tools and skills. Therefore, the researchers argue, in some cases the chatbot may not be able to solve problems immediately, but can certainly do so with guidance.
The study's authors believe that practicing with ChatGPT through the "Zone of Proximal Development" concept may help turn its limitations into learning opportunities. By guiding, questioning and examining the chatbot's responses, students will not only navigate ChatGPT's boundaries, but also develop the critical skills of examining data and research methodologies to prove a hypothesis or theory as well as reasoning—which lies at the heart of mathematical thinking.
"Unlike proofs found in reputable textbooks, students cannot assume ChatGPT's proofs are valid. Understanding and evaluating AI-generated proofs become key skills that must be integrated into the mathematics curriculum," said Stylianides. "These are core skills we want students to master. This means using prompts like 'I want us to explore this problem together,' not 'Tell me the answer,'" added Marco.





