Here is the English translation:
Assistant Professor Yubo Chen of the Department of Electrical and Computer Engineering at the University of California, Davis, conducts research related to "white-box models". Additionally, he was a postdoctoral researcher under Turing Award winner and Meta Chief AI Scientist Yann LeCun. In this episode, he talked to us about the latest research progress on white-box models, and also shared with us about Yann LeCun, the scientist he is familiar with who has experienced the ups and downs of the AI industry but remains purely focused.
Here are some selected excerpts from the interview
01 The Human Brain and Large Models
Silicon Valley 101: Can you briefly introduce the "white-box model" research you are currently doing? Have you found any ways to explain the input and output problems of GPT in your research process?
Yubo Chen: A major goal in this direction is to push deep learning from a purely empirical discipline to a scientific discipline, or to turn engineering into science, because currently engineering is developing faster but science is relatively slow. There used to be a model called word embedding, which could learn some representations of language.
At that time, people actually had a question: our task performance improved, but what exactly led to this improvement in performance? So we did a very early work at that time, which was to try to open up these representations of vocabulary. When you open it up, you will find some very interesting phenomena.
For example, for the word "apple", you can find some elemental meanings in it. One of the meanings may represent fruit, another meaning represents dessert, and if you dig deeper you will find meanings of technology and products, of course referring to Apple company's products. So you will find that along a word you can find these elemental meanings, and then you can extend this method to large language models.
In other words, after we have learned a large language model, we can look for some elemental meanings it carries in the model, and then try to open them up. You will find that a large language model actually has many layers.
In the primary layers, a phenomenon called "word disambiguation" occurs. For example, in English there is a word called "left", which has both the meaning of turning left and the past tense of leaving. Its specific meaning depends on the context before and after, so the large language model completes word disambiguation in the first few layers.
In the middle period, you will find that some new meanings are generated. At that time, we thought an interesting thing was called "unit conversion", which would be activated when converting kilometers to miles or temperature from Fahrenheit to Celsius. This meaning would be opened up, and you can find many similar elemental meanings along this path.
When you go further up, you will even find that there is a pattern among these elemental meanings. This pattern is that when a repeated meaning appears in the context, it will be activated. You can use this method to open up large language models and small language models. Of course, these ideas are not entirely new. They actually have a history in visual models, for example, starting from Matthew Zeiler, there have been some similar explorations.
Silicon Valley 101: Following this line of thought, is it possible that if we know how it operates partially, we can have many optimizations from an engineering perspective?
Yubo Chen: Yes, this is a very good question. I think a higher requirement for doing any theory is to guide practice, so when we were working on language models and vocabulary representations at that time, one of our goals was to see if we could optimize these models after understanding them. It is actually possible.
To give an example, if you find an elemental meaning in a large language model that activates when it sees a certain type of elemental meaning, then this neuron can be used as a discriminator, and you can use this to do some tasks. By changing these elemental meanings, you can adjust the model's biases.
That is, if I can discover it, I can adjust it. Recently, Anthropic did a similar work, which is to find some biases that may exist in the language model, and then make some changes to make the model more fair and safe.
Silicon Valley 101: I saw that OpenAI also had a study last year, using GPT4 to explain GPT2, to see how GPT2 actually works. For example, they found that when GPT2 answers all questions about American history around 1800, the 12th neuron in the 5th row is activated, and when answering in Chinese, the 13th neuron in the 12th row is activated.
If you turn off this neuron that answers in Chinese, its ability to understand Chinese will decrease significantly. But as you get to later neurons, for example, when the neurons reach around 2000 rows, its overall credibility has already decreased a lot. Have you noticed their research?
Yubo Chen: I haven't read this article yet, but this method is very similar to performing surgery on brain neurons. It's equivalent to now if there is a neural network, this network means that in some sense it can find a local existence rather than being completely dispersed, then you can perform some operations on it. For example, if you cut off a certain neuron, then you can think that a certain part of its ability is relatively lost.
Humans are actually the same. For example, a person with epilepsy may experience some language barriers after surgery, but it doesn't affect other bodily functions much. This seems similar in principle.
Silicon Valley 101: OpenAI and Anthropic are now researching the interpretability of large models. What's the difference between your research and theirs?
Yubo Chen: Whether white-box model research will be successful in the future is actually unknown to everyone. I have discussed this with my advisor before, but the consensus is that this is worth trying. If we go back to this area, what our research wants to do is actually to understand artificial intelligence, and through our understanding to reconstruct it, and then fundamentally build something different. So observation, or interpretability, I think is just a means.
That is to say, opening up this kind of model, doing these experiments, making some adjustments to the model, I think these are all means we try in the process of understanding, but what's really important about white-box models is to go back to the signal itself. Because whether it's the human brain or machines, the essence of their learning is because of signals.
There are some structures in our world, they also need to learn through these structures, and what they learn is precisely these structures. So can we find the laws behind these structures, as well as some mathematical tools to represent them, and then reorganize these things to build a different model? If this can be accomplished, I think it could bring expectations about improving the robustness, or safety and credibility of our systems.
In addition, its efficiency will also improve. This is a bit like how thermodynamics theory came out after the steam engine, supporting it to change from a completely artisanal discipline to a science. Similarly, today we are like having a steam engine on data for the first time, from not understanding our data before, to now finally being able to start making some AI algorithms to capture the patterns in the data.
Silicon Valley 101: So it would be more energy-efficient.
Yubo Chen: Speaking of energy efficiency, I can give a few interesting examples. The first point is definitely energy efficiency, because the brain is equivalent to a 20-watt light bulb, while current supercomputers may exceed one million watts.
The second point is, if we look at the evolution of various creatures in nature, its evolution efficiency is actually very high. For example, there is a special spider called the Jumping Spider, which has only a few million neurons, but it can make very complex three-dimensional trajectories to capture its prey.
And I think the most interesting thing is human efficiency in using data. Llama3 now has about 13 trillion tokens of data. But how much data can a person receive in a lifetime? Assuming we can receive 30 frames of images per second, 12 hours of acquisition time per day, for 20 years, we can get about 10 billion tokens, and the amount of text we can acquire is about the same, which is much less than large models.
So the question is, how does a person obtain such strong generalization ability through such a small amount of data? This is a point that I find amazing about the human brain in terms of efficiency.
Silicon Valley 101: Which is harder, uncovering how large models work or uncovering how the human brain works? They both sound very difficult to me.
Yubo Chen: These two have their own difficulties, but the methods are similar. Whether it's the human brain or large language models, we are trying to observe it and see what it responds to.
This method can actually be seen in the research on the visual cortex by David Hubel and Torsten Weisel, who won the Nobel Prize in Physiology or Medicine in the 1980s. They found a type of Simple Cell and tried to study what neurons would fire when people saw certain things, analyzing the different response states of neurons when seeing different things, such as when they don't respond at all, and when they get very excited. Then they found the Receptive field of neurons.