Thoughts on the LLM Paradigm
A semi-coherent thought spiral on the current state of machine learning research
Recently, I’ve found myself thrust out of my comfortable image analysis PhD cocoon and into the world of large language models (LLMs). When I interviewed for a "3D Computer Vision Researcher" role, I was told I was a dying breed. A dying breed? Excuse me? I’m a machine learning (ML) researcher in 2024. But it wasn’t the machine learning part they were talking about—it was the modality. Why focus on image analysis in this strange, exciting new era of natural language processing (NLP)?
I was hired for the role, and thought I’d stay focused on computer vision… but like all researchers, I am pulled by the tides of funding. So here I am working with LLMs, and I am baffled.
For some context, I briefly worked in NLP research in 2018/2019, focusing on automatic speech recognition. Although transformers had been around since 2016, I was still deep in the world of N-grams and LSTMs — a victim of corporate lag. Then, for five years during my PhD, I focused on vision research and largely ignored the LLM hype. Sure, I kept up with the releases of models, but I wasn’t immersed in their intricacies. Now, I find myself working with transformer-based LLMs daily, and it’s made me reflect on the paradigm shift in the field.
Paradigms and Occam’s Razor
One of my favorite courses during undergrad was “The History and Philosophy of Science.” In that class, I was introduced to Thomas Kuhn’s "The Structure of Scientific Revolutions," where the concept of a “paradigm shift” was first coined. According to Kuhn, a paradigm shift is a scientific revolution—an upset or “crisis” to “normal science,” where knowledge is built sequentially based on past scientific achievements or traditions. “In science,” Kuhn explains, “novelty emerges only with difficulty, manifested by resistance, against a background provided by expectation.”
The most appealing paradigm shifts obey Occam’s razor: the principle that the simplest explanation with the fewest assumptions, is more likely to be correct. For example, consider the shift from the geocentric to the heliocentric model of the solar system. Ptolemy's geocentric model provided very accurate predictions through complex mappings of planetary motion using combinations of epicycles. This model was more precise than Galileo's blasphemous model of the Earth circularly rotating around the sun. However, this simpler model was much closer to Kepler’s correct heliocentric description of our elliptical orbit around the sun, the basis for the theory of gravity. Now we know any curve can be estimated to arbitrary accuracy using epicycles. The geocentric model wasn’t wrong, practically speaking, but it was unnecessarily complicated and hindered the progression of science.
But science doesn’t always progress from complex to simple. Take classical physics: Newton’s laws of motion and gravitation were clean, elegant, and palatable. For a time, it seemed like physics was nearly entirely solved. But classical physics breaks down at extreme scales. Experimental anomalies revealed shortcomings that could not be ignored, eventually resulting in more complex theories of quantum mechanics and relativity. Classical physics seemingly aligns much more with Occam's razor, but it is wrong because it can’t explain all phenomena. In the words of Einstein (allegedly): "Everything should be made as simple as possible, but no simpler". Occam’s razor is about minimizing assumptions, not superficial simplicity. If that weren’t the case, religion would trump all scientific theory, thanks to that little “mysterious ways” clause. The shortest mathematical proof isn’t always the simplest. Syntactic simplicity != semantic simplicity. But that’s another soapbox.
The challenge with paradigms or periods of “normal science” is that they are based on circular arguments and riddled with confirmation bias. For example, we learn about atoms using a machine we specifically built to do so, based on our existing knowledge of atoms. We risk missing the bigger picture when we confuse coherence truth (something that fits within an existing framework) with correspondence truth (something that aligns with reality) and paradigm-conforming science is all about coherence.
The ML/LLM Paradigm
The modern scientific method or ML approach is: “(1) Identify a challenge problem or task; (2) Create a dataset of desired input-output instances; (3) Select or define one or more evaluation metrics; and (4) Develop, apply, and refine machine learning models and algorithms to improve performance.” [1]. This plan sounds pretty good from a purist perspective - meaning if we believe the data is pure and the evaluation metrics correctly capture performance. But in reality, most of our data and metrics are theory-laden paradigm-dependent; our progress doesn’t even exist outside of our idea of it. I’m not saying the ML methodology isn’t useful, it is. But I think Kuhn would advise us to acknowledge and remember that we tell ourselves fiction in order to understand the world. Even math operates in a world once removed. But true innovation comes from paradigm shifts. Classical physics gave us clean equations but modern physics in all its whacky glory gave us GPS and laser scanners.
Scientists hate renouncing paradigms, often preferring to patch them with ad hoc modifications. The Pythagoreans, for instance, were so committed to the idea that the world was built on rational numbers that they executed the first person to prove the existence of irrational numbers (sqrt 2). (I love this example because the discovery that was incommensurate with the paradigm was incommensurability itself.) This human tendency to defend an existing worldview, rather than reshape it, can slow progress. When we start adding asterisks and special case stipulations to simple solutions, they are no longer simple. Machine learning research is full of such post hoc band-aids—fixes that are often too application-specific and niche to make a real impact. Walk around any ML conference poster session and you’ll see what I mean.
LLMs resemble Ptomley’s Epicycles
LLMs evolved from this ML paradigm and I can’t help but feel like they might resemble Ptolemy’s epicycles—useful, and predictive, but perhaps overly complex. Just like an infinite series of linear models approximating a simple curve, transformers are a powerful tool. But are they the best or final tool? Useful, yes (I used GPT to edit this blog) – but still far from performing comprehensive reasoning.
“Central is the challenge of scale. No child needs to read or hear more than half the internet’s English text in order to use the language.” [1]
The primary driving force behind recent scientific progress has been scaling models (more data, more parameters, and more compute) rather than true advancements in theory. The success of transformers, and specifically LLMs, has been primarily empirical. In a sense, it's the success of "brute-force" learning over well-understood or rigorously formalized theories. I can’t help but feel like this exponential increase in compute and data availability allowed us to skip an important step in understanding. I mean it seems more probable that an amoeba crawled out of the primordial ooze over an elephant, but both are almost equally improbable. Maybe LLMs are the elephant and we missed the amoeba.
Now that we are in this LLM paradigm, it feels like we are a little distracted and scientifically stuck. I attended ICLR in May where they showed a plot of the ten-fold increase in LLM paper from 2023 to 2024, from under 50 submissions to over 500. So many of these papers are incremental, small advancements that mask or sidestep the deeper, more fundamental issues with the models themselves. Prompt engineering, in-context learning, preference alignment, so much of it feels like smoke and mirrors. I’ve read a lot of papers recently that can be widdled down to: “Look if you tell the LLM exactly what you want it to output, it’s more likely to output it!” It feels like ecstatic induction, frantic induction, a gross indulgence in the human urge to theorize before we understand the fundamentals of what is going on.
“Current LLMs remain somewhat monolithic, expensive, amnesic, delusional, uncreative, static, assertive, stubborn, and biased black boxes.” [1]
Looking forward
By focusing on incrementally making models more useful or interpretable for specific tasks, we are missing the larger point of whether the system truly “understands” what it's doing, or is merely reflecting statistical patterns that happen to align with human behavior. I don’t mean to undermine the feats of LLMs because they are remarkable, but I know we haven’t fully tapped into the deeper cognitive mechanisms required for true understanding or reasoning. Anomalies are what ultimately drive paradigm shifts. We need to talk about them, seek them out, and resist the urge to defend the status quo. The next paradigm in AI may be as groundbreaking as the leap from classical to modern physics. Perhaps, like Ptolemy’s epicycles, LLMs become a useful stepping stone towards something more elegant and fundamentally insightful about intelligence, something Occam might like.
[1] Li, Sha, et al. "Defining a New NLP Playground." The 2023 Conference on Empirical Methods in Natural Language Processing.
Thanks for reading! For fun, here are some AI-generated images from Gemini ImageGen3: