- The AI Pulse
- Posts
- š§ Companion Chatbots Reach an Inflection Point
š§ Companion Chatbots Reach an Inflection Point
PLUS: How AI Isnāt Exactly a Force for Democratization
Welcome back AI prodigies!
In todayās Sunday Special:
šInflection Point
š¤ØDevelopersā Dilemma
š¤Chatbot Memory
šKey Takeaway
Read Time: 6 minutes
šKey Terms
Foundational Model: machine learning models (e.g., OpenAIās GPT-4) trained on a broad spectrum of unlabeled data that can generate a wide range of text, images, or videos.
Graphics Processing Unit (GPU): a specialized computer chip capable of parallel processing (i.e., performing mathematical calculations simultaneously), making it ideal for complex applications like generative AI.
Tokens: characters derived from separating words, phrases, or sentences that AI models find meaningful when processing text.
Context Window: the number of tokens an AI model uses from recent user prompts and responses within the same conversation to generate a new response.
šINFLECTION POINT
Big Tech continues to tighten its grasp on generative AI development. Inflection-2.5, a foundational model and OpenAI rival from startup Inflection AI, recently became an AI studio within Microsoft. Inflection-2.5 is the model behind Pi, a friendly, supportive chatbot described as a coach, companion, confidant, or creative partner. Though it competed effectively with fellow AI companions like Replika, attracting over 6 million Monthly Active Users (MAUs), growing Pi is no longer a top priority.
The specific rationale behind Inflection AIās decision is not public. However, their press release indicates a preference for enterprises over consumers and infrastructure over applications. This shift is hardly surprising, as itās one of the best risk-adjusted AI bets, owing to its position in the AI technology stack. The stack has four layers, each like a component of a skyscraper. Below, youāll find what each layer does and its current leader.
Infrastructure (e.g., Nvidia and Microsoft): Hardware (e.g., GPUs) and cloud computing power AI models at every stage, including training, fine-tuning, and querying. They form the steel beams of the skyscraperās foundation.
Foundational Models (e.g., OpenAIās GPT-4): Rising from this foundation are the central pillars of LLMs and multimodal AI models. These are templates that developers customize to solve specific problems for specific users.
Machine Learning Operations (e.g., Databricks): Connecting the central pillars to the floors are Machine Learning Operationsāthe tools and frameworks for training, deploying, monitoring, and improving the performance of AI models.
Applications (e.g., OpenAIās ChatGPT): People live and work in the rooms of skyscrapers, which are akin to user-facing applications. Both generalist (e.g., Googleās Gemini) and specialist models (e.g., CanvaGPT) solve problems for us.
š¤ØDEVELOPERSā DILEMMA
As we recently outlined, big, medium, and small corporations are hungry to integrate generative AI into their products and services. For non-native AI companies, this means deploying applications. Though most use cases are currently unproven, potential advances in productivity and profitability justify enormous upfront investments in foundational models.
Foundational models are templates for specialized AI models. Want to build a chatbot that excels at copywriting, coding, or legal document analysis? Instead of training it from scratch, using hundreds of millions of parameters and millions of dollars of computing power, developers prefer to build on top of foundational models. Itās only natural, and perhaps necessary, that cash-flush companies like Microsoft, Meta, Google, and Amazon build foundational models like GPT-4, Llama 2, and Titan, respectively. Granted, some foundational models, like Anthropicās Claude 3, are independently run, but Google and Amazon have invested $2 billion and $4 billion, respectively. Aside from Cohere, which partnered with Oracle, nearly all leading foundational models rely on cloud computing services from Amazon, Microsoft, or Google, which collectively control two-thirds of the global market.
From an antitrust perspective, this isnāt inherently problematic. However, in certain jurisdictions, like the U.K., the Amazon and Microsoft duopoly commands 70% to 80% of the market share and has drawn regulatory scrutiny. AI is not an excellent democratizer for developers. Steep discounts and technical barriers to changing providers create lock-in. This dynamic is not new. In mobile app development, Appleās (iOS) and Googleās (Android) operating systems control app distribution on their respective devices. Historically, Apple charged developers a 30% commission, with exceptions for small businesses at 15%. Recently, in response to a European Union (EU) probe, theyāve adopted lower rates for EU developers. Google has also complied with recent EU guidelines, reducing fees on in-app purchases to 10% and subscriptions to 5% in an applicationās first two years of operation.
Microsoftās incorporation of Inflection AI preserves the status quo for application developers, who must comply with the profit-maximizing fee structure Big Tech executives select. This implication of Inflection AIās decision is important, but market consolidation motivations donāt quite explain their shift in investment away from AI companions.
š¤CHATBOT MEMORY
According to renowned venture capital firm Andreessen Horowitz, AI companions are the stickiest generative AI consumer apps. The average AI companion user conducts nearly 200 conversational sessions monthly, whereas educational, content-generating, and general assistants garner almost 10X less engagement. In addition, Pi retained 60% of users weekly, and 10% of sessions lasted more than an hour.
Piās context window may be lacking, but Inflection AI didnāt share details. One of the most critical AI companion features, the context window, enables companion chatbots to learn and recall usersā experiences and stories. The crux of the problem with expanding the context window is the sheer volume of calculations and memory needed. The more tokens the AI model considers at once, the more complex and memory-intensive these calculations become. The querying required to train a hyper-personalized companion chatbot would likely require thousands of gigabytes (GB) of high-bandwidth memory, significantly beyond modern GPU capacity. One solution is Ring Attention, a framework that breaks down the userās prompt into smaller, manageable blocks. Separate devices arranged in a ring-like structure process each block simultaneously. As each device finishes with its block, it passes on crucial information to the next device in the ring, ensuring a continuous flow of context without overloading any single device. Despite these advances, Ring Attention still needs to be improved, and the context window needs to be extended to ingest and consider human stories when providing coaching or companionship to end-users.
šKEY TAKEAWAY
With unproven monetization and technical limitations, Inflection AIās decision is rational. As more generative AI application builders shift down the technology stack to foundational model development and deployment, consumers may have to wait longer for killer AI apps.
šFINAL NOTE
If you found this useful, follow us on Twitter or provide honest feedback below. It helps us improve our content.
How was todayās newsletter?
ā¤ļøAI Pulse Review of The Week
āThese Sunday Specials put a smile on my face.šā
šNOTION TEMPLATES
šØSubscribe to our newsletter for free and receive these powerful Notion templates:
āļø150 ChatGPT prompts for Copywriting
āļø325 ChatGPT prompts for Email Marketing
šSimple Project Management Board
ā±Time Tracker
Reply