4+88, it's just guessing, directly giving 120, then it stopped, as if saying, "Maybe I need to provide some explanation". It then calculated each step, 74 and 88, put them together, and got a different answer from the initial guess, 92. Then, the experimenter said, "Wait, you said the answer was 120 earlier", they said, "That was a mistake, the correct answer is 92".
So, essentially, these technologies, at least at the current level, don't yet have the ability... They don't actually have the basic facts of correctness. People try to force it to think step by step through experiments, rather than just guessing the answer, which does help a little, but these are all "hacks". We're not... They're not as reliable as experts, although they can sometimes provide expert-level output, or at least similar to expert-level output.
So the question is, how do we use this technology? It's a different kind of technology. We're used to technologies that would make mistakes before, these poor technologies would produce poor output, but usually when a program or technology produces bad output, you can say it's bad, like... it doesn't look real. But actually, AI is designed with weights specifically chosen so that the answers are as close to correct as possible, so even when they're wrong, they look very convincing. Therefore, existing perceptions of how to detect when something looks good or bad, especially when you want to use it in any way that could cause actual harm.
For example, if you want to use AI to make medical decisions or financial decisions, or even as a therapist, these text generators have the potential to be great companions, but they can also give very bad advice.
So in many areas, although AI has huge potential, safety is not yet up to standard. It's like you invented the jet engine, you can quickly simulate some kind of powered aircraft with it, but it might take decades to reach a state where the public feels safe, air travel is currently the safest travel per mile hour today, although it's obviously a dangerous technology, these issues will be and can be resolved, but you really have to consider safety issues, you have to assume it will happen.
On the other hand, AI also has good application prospects in scenarios where downside risks are small. For example, you may have noticed that all the background slides in the report were generated by AI, and perhaps you've noticed some flaws, AI is still bad at generating text, but it's slowly getting better, and the downside risk is small, so it just needs to look convincing, the background images are not the main, core part of my speech. So in some applications, such downside risks are indeed acceptable.
Especially in the scientific field, one way to reduce the risk of errors and biases is scientific validation, especially independent validation. If there are some methods that can combine the truly powerful output of AI, filter out the garbage through independent validation, and keep only the good stuff, many potential applications will emerge.
To give another analogy, a faucet can produce a certain amount of drinking water, but the amount it can produce is limited. Suddenly, we have large fire hoses that can produce 100 times the water, but this water can't be drunk directly. If you have a filter device that filters out the undrinkable parts, you have a large amount of drinking water. This is the direction I see science and mathematics developing.
Currently, many scientific fields are facing bottlenecks and need good candidates to solve problems. Maybe you're working on drug design, wanting to find a drug to treat a certain disease, you have to first come up with a drug, perhaps from nature or by modifying drugs, then you have to synthesize it, you have to conduct a multi-year trial, phase one trial, phase two trial... and these trials are very expensive, so currently only big pharmaceutical companies can keep doing this. In fact, many of the drugs you try don't work, and you have to abandon them at some point in this process. Sometimes you're lucky, although they can't cure the disease, they can work in other ways. The problem is, you still need to make many attempts and face many errors.
AI technology promises to reduce the number of candidates, and people are now using it to simulate proteins. With enough data, you can simulate proteins to see if they can bind to a certain receptor, or if they can inhibit the action of a certain enzyme, so you can greatly reduce the number of drug candidates that need to be actually synthesized and tested.