• The AI Pulse
  • Posts
  • šŸ§  AI Progress in the 2020s: The Decade of Disruption

šŸ§  AI Progress in the 2020s: The Decade of Disruption

PLUS: Why Leading Nations Wonā€™t Halt AI Developments Despite the Risks

Welcome back AI prodigies!

In todayā€™s Sunday Special:

  • šŸ“ˆThe Last 4 Years of AI Progress

  • ā›½ļøWhat Fuels ā€œOpenAI o1ā€™sā€ Success?

  • šŸ¦ŗThe Risks of AI Progress

  • šŸ”‘Key Takeaway

Read Time: 7 minutes

šŸŽ“Key Terms

  • Large Language Models (LLMs): AI models pre-trained on vast amounts of data to generate human-like text.

  • ImageNet: A benchmark for image classification. It consists of over 14 million images organized into more than 20,000 categories.

  • Convolutional Neural Network (CNN): A network of specialized layers that detect visual patterns, such as edges, textures, and structures.

  • Reinforcement Learning (RL): Teaches AI models to make decisions that result in the best outcomes. It mimics the ā€œtrial-and-errorā€ process humans use to learn, where actions that lead to desired outcomes are reinforced.

  • Floating Point Operations per Second (FLOPs): How many operations (i.e., addition, subtraction, multiplication, and division) a computer solves within a second. Better AI models with larger datasets generally require more FLOPs.

šŸ©ŗ PULSE CHECK

Have you been impressed by the rapid pace of innovation with conversational chatbots like OpenAIā€™s ChatGPT?

Vote Below to View Live Results

Login or Subscribe to participate in polls.

šŸ“ˆTHE LAST 4 YEARS OF AI PROGRESS

Since 2010, AI has advanced at breakneck speeds. According to Epoch AI, the computing power of cutting-edge LLMs has increased by 4x each year over the past 14 years and by 5x each year in the last 4 years. Epoch AI uses the number of FLOPs required for AI model training to measure computing power. Hereā€™s how the best Vision and Language AI models have evolved in the last 4 years:

2020

  • Vision: Google Researchā€™s ā€œEfficientNet-L2ā€ was a type of CNN that excelled at image classification tasks, accurately identifying and categorizing objects within images. It achieved 88% accuracy on the ImageNet benchmark. For context, humans achieved 95% accuracy on the ImageNet benchmark.

  • Language: OpenAIā€™s ā€œGPT-3ā€ was the precursor to GPT-3.5, which powered the first version of ChatGPT released in 2022. ā€œGPT-3ā€ could generate text, translate content, and tackle basic reasoning problems. For instance, ā€œGPT-3ā€ detected 90% of disinformation correctly, on par with humans. However, ā€œGPT-3ā€™sā€ abstract reasoning ability was comparable to that of a three-year-old.

2022

  • Vision: Google Researchā€™s ā€œCoAtNet-7ā€ blended CNN and Attention Mechanisms. CNN layers identified local patterns, like edges or textures, in a small area of an image. Attention Mechanisms, on the other hand, excelled at understanding the bigger picture by connecting different parts of the image, like recognizing how a faceā€™s eyes relate to the mouth. As a result, ā€œCoAtNet-7ā€ achieved 91% accuracy on the ImageNet benchmark.

  • Language: Google Researchā€™s ā€œPaLMā€ excelled at coding, reasoning, and translation. ā€œPaLM,ā€ combined with CoT prompting, outperformed ā€œGPT-3ā€ in multi-step arithmetic problems and common-sense reasoning tasks. It achieved 58% accuracy on the Grade School Math 8000 (GSM8K) benchmark, the gold standard for measuring elementary mathematical reasoning in AI models.

2024

  • Vision: Metaā€™s ā€œSAM 2ā€ tackles image editing tasks. It can outline, segment, or ā€œcut outā€ any object in any image with a single click by leveraging Zero-Shot Learning: an AI modelā€™s ability to segment objects in images without being trained explicitly on those objects. This breakthrough has already been applied in the real world. ā€œSAM 2ā€ has been used to analyze satellite imagery for disaster relief and segment microscopic images of cells to detect skin cancer.

  • Language: OpenAIā€™s ā€œOpenAI o1ā€ is currently the best LLM. Whereas Googleā€™s ā€œPaLMā€ required CoT prompting to outperform previous AI models, ā€œOpenAI o1ā€ leverages CoT reasoning, which enables it to ā€œthinkā€ before responding by breaking down complex problems into manageable steps. Similar to how humans think before responding to complex questions. In a qualifying exam for the International Mathematical Olympiad (IMO), ā€œOpenAI o1ā€ correctly solved 83% of the problems and ranked among the top 500 U.S. students on the American Invitational Mathematics Examination (AIME). It also scored in the 89th percentile on Codeforcesā€™s competitive programming questions.

ā›½ļøWHAT FUELā€™S ā€œOpenAI o1ā€™sā€ SUCCESS

OpenAIā€™s latest progress has been driven by CoT reasoning. ā€œOpenAI o1ā€ uses RL to improve this capability. Although OpenAIā€™s process isnā€™t public, AI experts agree that ā€œOpenAI o1ā€ leverages well-understood RL principles to improve its responses. RL has four key components designed to improve the accuracy, relevance, and format of outputs:

  1. Agent: This is the learner or decision maker.

  2. Environment: This is everything the Agent interacts with.

  3. Actions: These are the things the Agent can do to interact with the Environment. For example, in a video game, Actions could be moving left or right, jumping, or shooting.

  4. Rewards: After the Agent performs an Action, the Environment provides feedback through rewards (i.e., positive feedback āœ…) or penalties (i.e., negative feedback āŒ). The goal is for the Agent to perform as many Actions as possible that lead to Rewards.

In ā€œOpenAI o1ā€™sā€ RL process, the Agent is the AI model itself. ā€œOpenAI o1ā€ solves complex problems based on the userā€™s prompt. In this case, the Environment is the complex set of inputs and instructions from the user. The Actions are the different reasoning steps the AI model can take to solve the userā€™s prompt. For instance, when answering a complex problem, the AI model might break it down into a series of smaller sub-problems to find the solution. Each Action is part of the reasoning process ā€œOpenAI o1ā€ uses to arrive at the most accurate response to the userā€™s prompt. The Rewards in ā€œOpenAI o1ā€ are likely based on user satisfaction, accuracy, and efficiency:

  • User Satisfaction: It receives a Reward if the user gives the response a thumbs up (i.e., presses the thumbs up button below the response).

  • Accuracy: It receives a Reward if the reasoning steps lead to a correct and well-structured response.

  • Efficiency: It receives a Reward if it reaches the correct solution using fewer reasoning steps or less computing power.

ā€œOpenAI o1ā€™sā€ performance is constantly evaluated using these Rewards to refine its reasoning abilities. By adjusting based on Rewards, ā€œOpenAI o1ā€ improves its ability to ā€œthinkā€ before it responds. Although CoT and RL enable ā€œOpenAI o1ā€ to outperform every other AI model on General Problem-Solving and Casual Reasoning (GPCA), this success is limited to ā€œshort-horizonā€ tasks. AI models canā€™t yet outperform experts on ā€œlong-horizonā€ tasks that take many hours, days, weeks, or years, but AI firms are actively pushing toward this and will likely achieve it within this decade.

šŸ¦ŗTHE RISKS OF AI PROGRESS

The Open Letter Movement?

At times, the rapid pace of AI progress has stirred controversy. Recall in 2023 when researchers, developers, and engineers published an open letter to ā€œpause giant AI experimentsā€:

ā€œTherefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable and include all key actors. Governments should institute a moratorium if such a pause cannot be enacted quickly.ā€

-Quote Source: ā€œPause Giant AI Experiments: An Open Letterā€/Published March 22, 2023 With 33,707 Signatures

Or in 2024, when employees from frontier AI labs such as Anthropic, OpenAI, and Google DeepMind published an open letter urging AI companies to develop whistleblower channels so that employees can raise concerns about AI developments without fear of retaliation. The ā€œRight to Warnā€ petition pushed AI companies to agree to several principles, including establishing and facilitating anonymous channels to raise AI concerns.

The majority of AI companies disregarded their demands. Today, whether the pace of AI progress should change needs to be clarified. One could speed up or slow down AI progress at various levels: an organization, a country, a set of countries, or globally. But that paradigm is disconnected from AI model-level advancements.

Two Types of AI Model-Level Advancements?

There are two types of AI model-level advancements: improving large, generalist AI models or integrating small, specialized AI models into existing workflows. The first type refers to the exponential improvement of general-purpose AI models (e.g., ā€œGPT-4ā€ or ā€œOpenAI o1ā€), which comes with unknown and potentially unquantifiable risks if continued into the foreseeable future. The second type emphasizes the integration of tailored AI models (e.g., GitHubā€™s Copilot) into existing workflows (e.g., software developers with coding tasks), which is comparatively low-risk and high-reward at the level of an organization. Sure, misusing an AI model in a particular use case would cause harm. In banking, a fraud detection AI model might underestimate the probability that a transaction is fraudulent, resulting in millions of dollars in losses. But such harms would be localized.

How Do We Mitigate These Risks?

Both types of AI model-level advancements are in full swing. The most important question is how can we mitigate their risks. While some view the AI debate as a test of their stance on technology, the real challenge lies in assessing unpredictable risk-to-reward trade-offs. Ethical principles like ā€œdo more good than harmā€ and ā€œpromote global welfareā€ start the discussion, but resolving the ideal pace of AI advancements is nearly impossible. Global superpowers canā€™t agree to remove proven species-destroying technologies; denuclearization talks between the U.S. and Russia have gone nowhere. Expecting an international consensus on slowing AI progress is foolish. Thus, U.S. AI frontier labs, in conjunction with the Department of Defense (DoD), will likely continue to develop AI models, as slowing progress would risk competitiveness with China.

šŸ”‘KEY TAKEAWAY

From 2020 to 2024, AI models like Googleā€™s ā€œEfficientNet-L2ā€ and Metaā€™s ā€œSAM 2ā€ revolutionized vision tasks, while OpenAIā€™s ā€œGPT-3ā€ and ā€œOpenAI o1ā€ set new standards for conversational chatbots. ā€œOpenAI o1,ā€ which integrates CoT reasoning with RL, is the best LLM ever created. However, it struggles with ā€œlong-horizonā€ tasks that take many hours, days, weeks, or years. The rapid pace of AI developments continues to stir ethical concerns, with AI experts debating whether progress should be slowed. Despite these concerns, global competition, particularly between the U.S. and China, ensures that AI developments will likely continue without pause.

šŸ“’FINAL NOTE

FEEDBACK

How would you rate todayā€™s email?

It helps us improve the content for you!

Login or Subscribe to participate in polls.

ā¤ļøTAIP Review of The Week

ā€œEvery morning I get up, grab a cup of tea, and read this, excited to learn something new!ā€

-Allison (1ļøāƒ£ šŸ‘Nailed it!)
REFER & EARN

šŸŽ‰Your Friends Learn, You Earn!

You currently have 0 referrals, only 1 away from receiving āš™ļøUltimate Prompt Engineering Guide.

Refer 3 friends to learn how to šŸ‘·ā€ā™€ļøBuild Custom Versions of OpenAIā€™s ChatGPT.

Reply

or to participate.