
Welcome back AI prodigies!
In todayâs Sunday Special:
đInflection Point
đ€šDevelopersâ Dilemma
đ€Chatbot Memory
đKey Takeaway
Read Time: 6 minutes
đKey Terms
Foundational Model: machine learning models (e.g., OpenAIâs GPT-4) trained on a broad spectrum of unlabeled data that can generate a wide range of text, images, or videos.
Graphics Processing Unit (GPU): a specialized computer chip capable of parallel processing (i.e., performing mathematical calculations simultaneously), making it ideal for complex applications like generative AI.
Tokens: characters derived from separating words, phrases, or sentences that AI models find meaningful when processing text.
Context Window: the number of tokens an AI model uses from recent user prompts and responses within the same conversation to generate a new response.
đINFLECTION POINT
Big Tech continues to tighten its grasp on generative AI development. Inflection-2.5, a foundational model and OpenAI rival from startup Inflection AI, recently became an AI studio within Microsoft. Inflection-2.5 is the model behind Pi, a friendly, supportive chatbot described as a coach, companion, confidant, or creative partner. Though it competed effectively with fellow AI companions like Replika, attracting over 6 million Monthly Active Users (MAUs), growing Pi is no longer a top priority.
The specific rationale behind Inflection AIâs decision is not public. However, their press release indicates a preference for enterprises over consumers and infrastructure over applications. This shift is hardly surprising, as itâs one of the best risk-adjusted AI bets, owing to its position in the AI technology stack. The stack has four layers, each like a component of a skyscraper. Below, youâll find what each layer does and its current leader.
Infrastructure (e.g., Nvidia and Microsoft): Hardware (e.g., GPUs) and cloud computing power AI models at every stage, including training, fine-tuning, and querying. They form the steel beams of the skyscraperâs foundation.
Foundational Models (e.g., OpenAIâs GPT-4): Rising from this foundation are the central pillars of LLMs and multimodal AI models. These are templates that developers customize to solve specific problems for specific users.
Machine Learning Operations (e.g., Databricks): Connecting the central pillars to the floors are Machine Learning Operationsâthe tools and frameworks for training, deploying, monitoring, and improving the performance of AI models.
Applications (e.g., OpenAIâs ChatGPT): People live and work in the rooms of skyscrapers, which are akin to user-facing applications. Both generalist (e.g., Googleâs Gemini) and specialist models (e.g., CanvaGPT) solve problems for us.
đ€šDEVELOPERSâ DILEMMA
As we recently outlined, big, medium, and small corporations are hungry to integrate generative AI into their products and services. For non-native AI companies, this means deploying applications. Though most use cases are currently unproven, potential advances in productivity and profitability justify enormous upfront investments in foundational models.
Foundational models are templates for specialized AI models. Want to build a chatbot that excels at copywriting, coding, or legal document analysis? Instead of training it from scratch, using hundreds of millions of parameters and millions of dollars of computing power, developers prefer to build on top of foundational models. Itâs only natural, and perhaps necessary, that cash-flush companies like Microsoft, Meta, Google, and Amazon build foundational models like GPT-4, Llama 2, and Titan, respectively. Granted, some foundational models, like Anthropicâs Claude 3, are independently run, but Google and Amazon have invested $2 billion and $4 billion, respectively. Aside from Cohere, which partnered with Oracle, nearly all leading foundational models rely on cloud computing services from Amazon, Microsoft, or Google, which collectively control two-thirds of the global market.
From an antitrust perspective, this isnât inherently problematic. However, in certain jurisdictions, like the U.K., the Amazon and Microsoft duopoly commands 70% to 80% of the market share and has drawn regulatory scrutiny. AI is not an excellent democratizer for developers. Steep discounts and technical barriers to changing providers create lock-in. This dynamic is not new. In mobile app development, Appleâs (iOS) and Googleâs (Android) operating systems control app distribution on their respective devices. Historically, Apple charged developers a 30% commission, with exceptions for small businesses at 15%. Recently, in response to a European Union (EU) probe, theyâve adopted lower rates for EU developers. Google has also complied with recent EU guidelines, reducing fees on in-app purchases to 10% and subscriptions to 5% in an applicationâs first two years of operation.
Microsoftâs incorporation of Inflection AI preserves the status quo for application developers, who must comply with the profit-maximizing fee structure Big Tech executives select. This implication of Inflection AIâs decision is important, but market consolidation motivations donât quite explain their shift in investment away from AI companions.
đ€CHATBOT MEMORY
According to renowned venture capital firm Andreessen Horowitz, AI companions are the stickiest generative AI consumer apps. The average AI companion user conducts nearly 200 conversational sessions monthly, whereas educational, content-generating, and general assistants garner almost 10X less engagement. In addition, Pi retained 60% of users weekly, and 10% of sessions lasted more than an hour.
Piâs context window may be lacking, but Inflection AI didnât share details. One of the most critical AI companion features, the context window, enables companion chatbots to learn and recall usersâ experiences and stories. The crux of the problem with expanding the context window is the sheer volume of calculations and memory needed. The more tokens the AI model considers at once, the more complex and memory-intensive these calculations become. The querying required to train a hyper-personalized companion chatbot would likely require thousands of gigabytes (GB) of high-bandwidth memory, significantly beyond modern GPU capacity. One solution is Ring Attention, a framework that breaks down the userâs prompt into smaller, manageable blocks. Separate devices arranged in a ring-like structure process each block simultaneously. As each device finishes with its block, it passes on crucial information to the next device in the ring, ensuring a continuous flow of context without overloading any single device. Despite these advances, Ring Attention still needs to be improved, and the context window needs to be extended to ingest and consider human stories when providing coaching or companionship to end-users.
đKEY TAKEAWAY
With unproven monetization and technical limitations, Inflection AIâs decision is rational. As more generative AI application builders shift down the technology stack to foundational model development and deployment, consumers may have to wait longer for killer AI apps.
đFINAL NOTE
If you found this useful, follow us on Twitter or provide honest feedback below. It helps us improve our content.
How was todayâs newsletter?
â€ïžAI Pulse Review of The Week
âThese Sunday Specials put a smile on my face.đâ
đNOTION TEMPLATES
đšSubscribe to our newsletter for free and receive these powerful Notion templates:
âïž150 ChatGPT prompts for Copywriting
âïž325 ChatGPT prompts for Email Marketing
đSimple Project Management Board
â±Time Tracker
