- The AI Pulse
- Posts
- š§ AI Progress in the 2020s: The Decade of Disruption
š§ AI Progress in the 2020s: The Decade of Disruption
PLUS: Why Leading Nations Wonāt Halt AI Developments Despite the Risks
Welcome back AI prodigies!
In todayās Sunday Special:
šThe Last 4 Years of AI Progress
ā½ļøWhat Fuels āOpenAI o1āsā Success?
š¦ŗThe Risks of AI Progress
šKey Takeaway
Read Time: 7 minutes
šKey Terms
Large Language Models (LLMs): AI models pre-trained on vast amounts of data to generate human-like text.
ImageNet: A benchmark for image classification. It consists of over 14 million images organized into more than 20,000 categories.
Convolutional Neural Network (CNN): A network of specialized layers that detect visual patterns, such as edges, textures, and structures.
Reinforcement Learning (RL): Teaches AI models to make decisions that result in the best outcomes. It mimics the ātrial-and-errorā process humans use to learn, where actions that lead to desired outcomes are reinforced.
Floating Point Operations per Second (FLOPs): How many operations (i.e., addition, subtraction, multiplication, and division) a computer solves within a second. Better AI models with larger datasets generally require more FLOPs.
š©ŗ PULSE CHECK
Have you been impressed by the rapid pace of innovation with conversational chatbots like OpenAIās ChatGPT?Vote Below to View Live Results |
šTHE LAST 4 YEARS OF AI PROGRESS
Since 2010, AI has advanced at breakneck speeds. According to Epoch AI, the computing power of cutting-edge LLMs has increased by 4x each year over the past 14 years and by 5x each year in the last 4 years. Epoch AI uses the number of FLOPs required for AI model training to measure computing power. Hereās how the best Vision and Language AI models have evolved in the last 4 years:
2020
Vision: Google Researchās āEfficientNet-L2ā was a type of CNN that excelled at image classification tasks, accurately identifying and categorizing objects within images. It achieved 88% accuracy on the ImageNet benchmark. For context, humans achieved 95% accuracy on the ImageNet benchmark.
Language: OpenAIās āGPT-3ā was the precursor to GPT-3.5, which powered the first version of ChatGPT released in 2022. āGPT-3ā could generate text, translate content, and tackle basic reasoning problems. For instance, āGPT-3ā detected 90% of disinformation correctly, on par with humans. However, āGPT-3āsā abstract reasoning ability was comparable to that of a three-year-old.
2022
Vision: Google Researchās āCoAtNet-7ā blended CNN and Attention Mechanisms. CNN layers identified local patterns, like edges or textures, in a small area of an image. Attention Mechanisms, on the other hand, excelled at understanding the bigger picture by connecting different parts of the image, like recognizing how a faceās eyes relate to the mouth. As a result, āCoAtNet-7ā achieved 91% accuracy on the ImageNet benchmark.
Language: Google Researchās āPaLMā excelled at coding, reasoning, and translation. āPaLM,ā combined with CoT prompting, outperformed āGPT-3ā in multi-step arithmetic problems and common-sense reasoning tasks. It achieved 58% accuracy on the Grade School Math 8000 (GSM8K) benchmark, the gold standard for measuring elementary mathematical reasoning in AI models.
2024
Vision: Metaās āSAM 2ā tackles image editing tasks. It can outline, segment, or ācut outā any object in any image with a single click by leveraging Zero-Shot Learning: an AI modelās ability to segment objects in images without being trained explicitly on those objects. This breakthrough has already been applied in the real world. āSAM 2ā has been used to analyze satellite imagery for disaster relief and segment microscopic images of cells to detect skin cancer.
Language: OpenAIās āOpenAI o1ā is currently the best LLM. Whereas Googleās āPaLMā required CoT prompting to outperform previous AI models, āOpenAI o1ā leverages CoT reasoning, which enables it to āthinkā before responding by breaking down complex problems into manageable steps. Similar to how humans think before responding to complex questions. In a qualifying exam for the International Mathematical Olympiad (IMO), āOpenAI o1ā correctly solved 83% of the problems and ranked among the top 500 U.S. students on the American Invitational Mathematics Examination (AIME). It also scored in the 89th percentile on Codeforcesās competitive programming questions.
ā½ļøWHAT FUELāS āOpenAI o1āsā SUCCESS
OpenAIās latest progress has been driven by CoT reasoning. āOpenAI o1ā uses RL to improve this capability. Although OpenAIās process isnāt public, AI experts agree that āOpenAI o1ā leverages well-understood RL principles to improve its responses. RL has four key components designed to improve the accuracy, relevance, and format of outputs:
Agent: This is the learner or decision maker.
Environment: This is everything the Agent interacts with.
Actions: These are the things the Agent can do to interact with the Environment. For example, in a video game, Actions could be moving left or right, jumping, or shooting.
Rewards: After the Agent performs an Action, the Environment provides feedback through rewards (i.e., positive feedback ā ) or penalties (i.e., negative feedback ā). The goal is for the Agent to perform as many Actions as possible that lead to Rewards.
In āOpenAI o1āsā RL process, the Agent is the AI model itself. āOpenAI o1ā solves complex problems based on the userās prompt. In this case, the Environment is the complex set of inputs and instructions from the user. The Actions are the different reasoning steps the AI model can take to solve the userās prompt. For instance, when answering a complex problem, the AI model might break it down into a series of smaller sub-problems to find the solution. Each Action is part of the reasoning process āOpenAI o1ā uses to arrive at the most accurate response to the userās prompt. The Rewards in āOpenAI o1ā are likely based on user satisfaction, accuracy, and efficiency:
User Satisfaction: It receives a Reward if the user gives the response a thumbs up (i.e., presses the thumbs up button below the response).
Accuracy: It receives a Reward if the reasoning steps lead to a correct and well-structured response.
Efficiency: It receives a Reward if it reaches the correct solution using fewer reasoning steps or less computing power.
āOpenAI o1āsā performance is constantly evaluated using these Rewards to refine its reasoning abilities. By adjusting based on Rewards, āOpenAI o1ā improves its ability to āthinkā before it responds. Although CoT and RL enable āOpenAI o1ā to outperform every other AI model on General Problem-Solving and Casual Reasoning (GPCA), this success is limited to āshort-horizonā tasks. AI models canāt yet outperform experts on ālong-horizonā tasks that take many hours, days, weeks, or years, but AI firms are actively pushing toward this and will likely achieve it within this decade.
š¦ŗTHE RISKS OF AI PROGRESS
The Open Letter Movement?
At times, the rapid pace of AI progress has stirred controversy. Recall in 2023 when researchers, developers, and engineers published an open letter to āpause giant AI experimentsā:
āTherefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable and include all key actors. Governments should institute a moratorium if such a pause cannot be enacted quickly.ā
Or in 2024, when employees from frontier AI labs such as Anthropic, OpenAI, and Google DeepMind published an open letter urging AI companies to develop whistleblower channels so that employees can raise concerns about AI developments without fear of retaliation. The āRight to Warnā petition pushed AI companies to agree to several principles, including establishing and facilitating anonymous channels to raise AI concerns.
The majority of AI companies disregarded their demands. Today, whether the pace of AI progress should change needs to be clarified. One could speed up or slow down AI progress at various levels: an organization, a country, a set of countries, or globally. But that paradigm is disconnected from AI model-level advancements.
Two Types of AI Model-Level Advancements?
There are two types of AI model-level advancements: improving large, generalist AI models or integrating small, specialized AI models into existing workflows. The first type refers to the exponential improvement of general-purpose AI models (e.g., āGPT-4ā or āOpenAI o1ā), which comes with unknown and potentially unquantifiable risks if continued into the foreseeable future. The second type emphasizes the integration of tailored AI models (e.g., GitHubās Copilot) into existing workflows (e.g., software developers with coding tasks), which is comparatively low-risk and high-reward at the level of an organization. Sure, misusing an AI model in a particular use case would cause harm. In banking, a fraud detection AI model might underestimate the probability that a transaction is fraudulent, resulting in millions of dollars in losses. But such harms would be localized.
How Do We Mitigate These Risks?
Both types of AI model-level advancements are in full swing. The most important question is how can we mitigate their risks. While some view the AI debate as a test of their stance on technology, the real challenge lies in assessing unpredictable risk-to-reward trade-offs. Ethical principles like ādo more good than harmā and āpromote global welfareā start the discussion, but resolving the ideal pace of AI advancements is nearly impossible. Global superpowers canāt agree to remove proven species-destroying technologies; denuclearization talks between the U.S. and Russia have gone nowhere. Expecting an international consensus on slowing AI progress is foolish. Thus, U.S. AI frontier labs, in conjunction with the Department of Defense (DoD), will likely continue to develop AI models, as slowing progress would risk competitiveness with China.
šKEY TAKEAWAY
From 2020 to 2024, AI models like Googleās āEfficientNet-L2ā and Metaās āSAM 2ā revolutionized vision tasks, while OpenAIās āGPT-3ā and āOpenAI o1ā set new standards for conversational chatbots. āOpenAI o1,ā which integrates CoT reasoning with RL, is the best LLM ever created. However, it struggles with ālong-horizonā tasks that take many hours, days, weeks, or years. The rapid pace of AI developments continues to stir ethical concerns, with AI experts debating whether progress should be slowed. Despite these concerns, global competition, particularly between the U.S. and China, ensures that AI developments will likely continue without pause.
šFINAL NOTE
FEEDBACK
How would you rate todayās email?It helps us improve the content for you! |
ā¤ļøTAIP Review of The Week
āEvery morning I get up, grab a cup of tea, and read this, excited to learn something new!ā
REFER & EARN
šYour Friends Learn, You Earn!
You currently have 0 referrals, only 1 away from receiving āļøUltimate Prompt Engineering Guide.
Refer 3 friends to learn how to š·āāļøBuild Custom Versions of OpenAIās ChatGPT.
Copy and paste this link to friends: https://theaipulse.beehiiv.com/subscribe?ref=PLACEHOLDER
Reply