
Welcome back AI prodigies!
In today’s Sunday Special:
📈The Last 4 Years of AI Progress
⛽️What Fuels “OpenAI o1’s” Success?
🦺The Risks of AI Progress
🔑Key Takeaway
Read Time: 7 minutes
🎓Key Terms
Large Language Models (LLMs): AI models pre-trained on vast amounts of data to generate human-like text.
ImageNet: A benchmark for image classification. It consists of over 14 million images organized into more than 20,000 categories.
Convolutional Neural Network (CNN): A network of specialized layers that detect visual patterns, such as edges, textures, and structures.
Reinforcement Learning (RL): Teaches AI models to make decisions that result in the best outcomes. It mimics the “trial-and-error” process humans use to learn, where actions that lead to desired outcomes are reinforced.
Floating Point Operations per Second (FLOPs): How many operations (i.e., addition, subtraction, multiplication, and division) a computer solves within a second. Better AI models with larger datasets generally require more FLOPs.
🩺 PULSE CHECK
Have you been impressed by the rapid pace of innovation with conversational chatbots like OpenAI’s ChatGPT?
📈THE LAST 4 YEARS OF AI PROGRESS
Since 2010, AI has advanced at breakneck speeds. According to Epoch AI, the computing power of cutting-edge LLMs has increased by 4x each year over the past 14 years and by 5x each year in the last 4 years. Epoch AI uses the number of FLOPs required for AI model training to measure computing power. Here’s how the best Vision and Language AI models have evolved in the last 4 years:
2020
Vision: Google Research’s “EfficientNet-L2” was a type of CNN that excelled at image classification tasks, accurately identifying and categorizing objects within images. It achieved 88% accuracy on the ImageNet benchmark. For context, humans achieved 95% accuracy on the ImageNet benchmark.
Language: OpenAI’s “GPT-3” was the precursor to GPT-3.5, which powered the first version of ChatGPT released in 2022. “GPT-3” could generate text, translate content, and tackle basic reasoning problems. For instance, “GPT-3” detected 90% of disinformation correctly, on par with humans. However, “GPT-3’s” abstract reasoning ability was comparable to that of a three-year-old.
2022
Vision: Google Research’s “CoAtNet-7” blended CNN and Attention Mechanisms. CNN layers identified local patterns, like edges or textures, in a small area of an image. Attention Mechanisms, on the other hand, excelled at understanding the bigger picture by connecting different parts of the image, like recognizing how a face’s eyes relate to the mouth. As a result, “CoAtNet-7” achieved 91% accuracy on the ImageNet benchmark.
Language: Google Research’s “PaLM” excelled at coding, reasoning, and translation. “PaLM,” combined with CoT prompting, outperformed “GPT-3” in multi-step arithmetic problems and common-sense reasoning tasks. It achieved 58% accuracy on the Grade School Math 8000 (GSM8K) benchmark, the gold standard for measuring elementary mathematical reasoning in AI models.
2024
Vision: Meta’s “SAM 2” tackles image editing tasks. It can outline, segment, or “cut out” any object in any image with a single click by leveraging Zero-Shot Learning: an AI model’s ability to segment objects in images without being trained explicitly on those objects. This breakthrough has already been applied in the real world. “SAM 2” has been used to analyze satellite imagery for disaster relief and segment microscopic images of cells to detect skin cancer.
Language: OpenAI’s “OpenAI o1” is currently the best LLM. Whereas Google’s “PaLM” required CoT prompting to outperform previous AI models, “OpenAI o1” leverages CoT reasoning, which enables it to “think” before responding by breaking down complex problems into manageable steps. Similar to how humans think before responding to complex questions. In a qualifying exam for the International Mathematical Olympiad (IMO), “OpenAI o1” correctly solved 83% of the problems and ranked among the top 500 U.S. students on the American Invitational Mathematics Examination (AIME). It also scored in the 89th percentile on Codeforces’s competitive programming questions.
⛽️WHAT FUEL’S “OpenAI o1’s” SUCCESS
OpenAI’s latest progress has been driven by CoT reasoning. “OpenAI o1” uses RL to improve this capability. Although OpenAI’s process isn’t public, AI experts agree that “OpenAI o1” leverages well-understood RL principles to improve its responses. RL has four key components designed to improve the accuracy, relevance, and format of outputs:
Agent: This is the learner or decision maker.
Environment: This is everything the Agent interacts with.
Actions: These are the things the Agent can do to interact with the Environment. For example, in a video game, Actions could be moving left or right, jumping, or shooting.
Rewards: After the Agent performs an Action, the Environment provides feedback through rewards (i.e., positive feedback ✅) or penalties (i.e., negative feedback ❌). The goal is for the Agent to perform as many Actions as possible that lead to Rewards.
In “OpenAI o1’s” RL process, the Agent is the AI model itself. “OpenAI o1” solves complex problems based on the user’s prompt. In this case, the Environment is the complex set of inputs and instructions from the user. The Actions are the different reasoning steps the AI model can take to solve the user’s prompt. For instance, when answering a complex problem, the AI model might break it down into a series of smaller sub-problems to find the solution. Each Action is part of the reasoning process “OpenAI o1” uses to arrive at the most accurate response to the user’s prompt. The Rewards in “OpenAI o1” are likely based on user satisfaction, accuracy, and efficiency:
User Satisfaction: It receives a Reward if the user gives the response a thumbs up (i.e., presses the thumbs up button below the response).
Accuracy: It receives a Reward if the reasoning steps lead to a correct and well-structured response.
Efficiency: It receives a Reward if it reaches the correct solution using fewer reasoning steps or less computing power.
“OpenAI o1’s” performance is constantly evaluated using these Rewards to refine its reasoning abilities. By adjusting based on Rewards, “OpenAI o1” improves its ability to “think” before it responds. Although CoT and RL enable “OpenAI o1” to outperform every other AI model on General Problem-Solving and Casual Reasoning (GPCA), this success is limited to “short-horizon” tasks. AI models can’t yet outperform experts on “long-horizon” tasks that take many hours, days, weeks, or years, but AI firms are actively pushing toward this and will likely achieve it within this decade.
🦺THE RISKS OF AI PROGRESS
The Open Letter Movement?
At times, the rapid pace of AI progress has stirred controversy. Recall in 2023 when researchers, developers, and engineers published an open letter to “pause giant AI experiments”:
“Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable and include all key actors. Governments should institute a moratorium if such a pause cannot be enacted quickly.”
Or in 2024, when employees from frontier AI labs such as Anthropic, OpenAI, and Google DeepMind published an open letter urging AI companies to develop whistleblower channels so that employees can raise concerns about AI developments without fear of retaliation. The “Right to Warn” petition pushed AI companies to agree to several principles, including establishing and facilitating anonymous channels to raise AI concerns.
The majority of AI companies disregarded their demands. Today, whether the pace of AI progress should change needs to be clarified. One could speed up or slow down AI progress at various levels: an organization, a country, a set of countries, or globally. But that paradigm is disconnected from AI model-level advancements.
Two Types of AI Model-Level Advancements?
There are two types of AI model-level advancements: improving large, generalist AI models or integrating small, specialized AI models into existing workflows. The first type refers to the exponential improvement of general-purpose AI models (e.g., “GPT-4” or “OpenAI o1”), which comes with unknown and potentially unquantifiable risks if continued into the foreseeable future. The second type emphasizes the integration of tailored AI models (e.g., GitHub’s Copilot) into existing workflows (e.g., software developers with coding tasks), which is comparatively low-risk and high-reward at the level of an organization. Sure, misusing an AI model in a particular use case would cause harm. In banking, a fraud detection AI model might underestimate the probability that a transaction is fraudulent, resulting in millions of dollars in losses. But such harms would be localized.
How Do We Mitigate These Risks?
Both types of AI model-level advancements are in full swing. The most important question is how can we mitigate their risks. While some view the AI debate as a test of their stance on technology, the real challenge lies in assessing unpredictable risk-to-reward trade-offs. Ethical principles like “do more good than harm” and “promote global welfare” start the discussion, but resolving the ideal pace of AI advancements is nearly impossible. Global superpowers can’t agree to remove proven species-destroying technologies; denuclearization talks between the U.S. and Russia have gone nowhere. Expecting an international consensus on slowing AI progress is foolish. Thus, U.S. AI frontier labs, in conjunction with the Department of Defense (DoD), will likely continue to develop AI models, as slowing progress would risk competitiveness with China.
🔑KEY TAKEAWAY
From 2020 to 2024, AI models like Google’s “EfficientNet-L2” and Meta’s “SAM 2” revolutionized vision tasks, while OpenAI’s “GPT-3” and “OpenAI o1” set new standards for conversational chatbots. “OpenAI o1,” which integrates CoT reasoning with RL, is the best LLM ever created. However, it struggles with “long-horizon” tasks that take many hours, days, weeks, or years. The rapid pace of AI developments continues to stir ethical concerns, with AI experts debating whether progress should be slowed. Despite these concerns, global competition, particularly between the U.S. and China, ensures that AI developments will likely continue without pause.
📒FINAL NOTE
FEEDBACK
How would you rate today’s email?
❤️TAIP Review of The Week
“Every morning I get up, grab a cup of tea, and read this, excited to learn something new!”
REFER & EARN
🎉Your Friends Learn, You Earn!
{{rp_personalized_text}}
Refer 3 friends to learn how to 👷♀️Build Custom Versions of OpenAI’s ChatGPT.
Copy and paste this link to friends: {{rp_refer_url}}
