- The AI Pulse
- Posts
- š§ Building AI Is Hard. But Implementing It? Even Harder.
š§ Building AI Is Hard. But Implementing It? Even Harder.
PLUS: What Makes Real-Time Disease Detection So Tricky

Welcome back AI prodigies!
In todayās Sunday Special:
šThe Prelude
šBenchmarks vs. The Real World
š¦¾Technicalities and Trust
ā³Real-Time Integration
šKey Takeaway
Read Time: 7 minutes
šKey Terms
Machine Learning (ML): Leverages data to recognize patterns and make predictions without being explicitly programmed to do so.
Large Language Models (LLMs): AI Models pre-trained on vast amounts of data to generate human-like text.
Large Reasoning Models (LRMs): AI Models designed to mimic a humanās decision-making abilities to solve complex, multi-step problems.
𩺠PULSE CHECK
When will you feel comfortable riding in self-driving cars?Vote Below to View Live Results |
šTHE PRELUDE
AI Capability is moving fast, but AI Adoption isnāt.
Each new advanced AI model seems to break a new benchmark by acing a Ph.D.-level exam or crushing a coding competition.
These accomplishments would suggest weāre sprinting toward societal transformation. In reality, our progress resembles a walk.
Benchmarks arenāt the real world. Just because an advanced AI model is capable doesnāt mean itās operational. To excel in the real world, advanced AI models must be reliable, secure, seamless, scalable, adaptable, user-friendly, cost-efficient, and aligned with human intent.
So, whatās slowing AI Adoption down? How can we bridge the gap between AI Capability and AI Adoption?
šBENCHMARKS VS. THE REAL WORLD
Today, advanced AI models are celebrated for achieving human-level performance or superhuman-level performance on standardized tests.
LLMs like OpenAIās āGPT-4ā beat 90% of law school graduates on the Uniform Bar Exam (UBE). LRMs like OpenAIās āOpenAI o3ā scored 99% on the HumanEval benchmark, which consists of 164 hand-crafted programming problems to assess an advanced AI modelās ability to generate functional code.
However, these accomplishments only measure performance on narrow, static problems that fail to reflect the complexity of real-world litigation or real-world software development.
āļøWhy AI Canāt Replace Lawyers, Yet.
āGPT-4āsā score on the UBE made headlines, fueling speculation that advanced AI models might replace lawyers sooner than we think. Not only does UBE evaluate memorization and issue-spotting, it also asks test-takers to form persuasive arguments and exercise ethical judgment. Nevertheless, test-takers rely on a narrow set of facts, far simpler than real disputes in a court of law. Real-world litigation is deeply contextual. It depends on tacit knowledge, like non-verbal cues, societal norms, and unwritten rules, which help lawyers assess the integrity of evidence. While LLMs can summarize case law or analyze legal proceedings in a hypothetical scenario, they donāt come close to producing court-ready documents that reflect the nuances, judgments, and situational awareness required when practicing law.
š»Why AI Canāt Replace Developers, Yet.
LLMs and LRMs regularly outperform developers on benchmarks like HumanEval. Yet, developers know that real software workflows involve debugging across abstraction layers, managing codebase dependencies, and ensuring the long-term maintainability of digital platforms. None of this is captured in benchmarks like HumanEval, which are built on clean, well-defined programming problems with fixed inputs and fixed outputs. While you can leverage AI-powered tools like bolt.new to draft initial codebases, they donāt replace the need for developers.
According to Addy Osmani, Head of Developer Experience at Google Chrome, developers donāt merely accept AI-generated code. Instead, they meticulously restructure it into smaller modules and rework the architecture of those modules to make sure the AI-generated code integrates properly into existing codebases. Integrating AI-generated code requires understanding the functionality and limitations of it.
š¦¾TECHNICALITIES AND TRUST
šThe Reality Gap.
AI Capability is starting to reveal the Reality Gap: As advanced AI models become more capable, the need for trust grows even more.
Even if advanced AI models perform well on benchmarks, implementing them in the real world still requires a lot of trust, and without trust, AI Adoption stalls.
šSelf-Driving Cars?
Consider the case of self-driving cars. Waymo, an autonomous ride-hailing service owned by Alphabet, Googleās parent company, has provided fully autonomous rides for nearly a decade.
Waymo has over 40 million miles of real-world driving experience in Phoenix, AZ, San Francisco, CA, Los Angeles, CA, and Austin, TX. Compared to human drivers with the same miles in the same locations, Waymo achieved 83% fewer airbag deployment crashes, 81% fewer injury-causing crashes, and 64% fewer police-reported crashes. Of 35 crashes between July 2024 and February 2025, Waymo was at fault for just one, with the other 34 caused by human error. The verdict seems clear: self-driving cars are safer than human-driven cars.
Even if Waymo continues to overcome all the technical challenges of self-driving cars and builds a near-perfect autonomous ride-hailing service, it must gain societyās trust. Trust ultimately comes down to whether a person feels comfortable in a self-driving car.
š„Acceptable Error Rates?
To create trust, we must agree on an acceptable error rate. Self-driving cars will always cause some crashes. However, AI-caused harm introduces a new ethical paradigm. Weāre willing to tolerate human error, creating cultural norms that accept imperfection, like saying: āno oneās perfect.ā But will we hold self-driving cars to the same imperfect standard?
People expect machines to be better than humans. This perception stems from traditional software, which is Deterministic. It produces the same output when given the same input. Every time you press the checkout button while shopping online, you expect something to go into your shopping cart. On the other hand, AI-enabled technology like self-driving cars relies on ML, which is Probabilistic. It produces different outputs when given the same input. Until we become comfortable with the Probabilistic nature of AI-enabled technology, we may disagree on a threshold of acceptable harm for self-driving cars.
ā³REAL-TIME INTEGRATION
Consider the Epic Sepsis Model (ESM) Inpatient Predictive Analytic Tool, an ML framework developed by Epic Systems to identify Sepsis.
Sepsis occurs when your body mounts a significant immune response against a bacterial infection, attacking your organs and tissues. As the third most common cause of death in U.S. hospitals, itās notoriously difficult to diagnose. To help doctors identify Sepsis, the ESM analyzes the Electronic Health Records (EHRs) of hospitalized patients to generate Sepsis risk estimates every 20 minutes throughout their stay. Though initially promising, ESMās performance varies significantly depending on which of the three detection stages itās deployed in:
Late-Stage Detection: After clinical signs of Sepsis became somewhat apparent, it correctly identified high-risk Sepsis patients 87% of the time.
Pre-Diagnosis Detection: When making predictions before patients met the full clinical criteria for Sepsis, it correctly identified high-risk Sepsis patients 62% of the time.
Early Detection: When making predictions before any blood tests were ordered to check for bacterial infections that may lead to Sepsis, it correctly identified high-risk Sepsis patients 53% of the time.
Timing creates a trade-off between accuracy and utility that persists in real-time disease detection. Early predictions are less accurate but more useful. Late predictions are more accurate but less useful. AI canāt be trusted if it canāt generate actionable insights at the right time. We accept a doctor making a diagnostic error, but hesitate when AI misdiagnoses, even if AIās overall accuracy is higher most of the time.
šKEY TAKEAWAY
Bridging the gap between AI Capability and AI Adoption requires more than just innovation; it demands trust. Trust hinges not only on performing well on benchmarks, but also on delivering consistent, context-aware results in real-world settings with near-perfect accuracy. Until we get comfortable with the Probabilistic component of AI, AI Adoption will lag AI Capability.
šFINAL NOTE
FEEDBACK
How would you rate todayās email?It helps us improve the content for you! |
ā¤ļøTAIP Review of The Week
āItās timely, unique, and informative. I can tell you put tons of effort into every email.ā
REFER & EARN
šYour Friends Learn, You Earn!
You currently have 0 referrals, only 1 away from receiving š3 Simple Steps to Turn ChatGPT Into an Instant Expert.
Share your unique referral link: https://theaipulse.beehiiv.com/subscribe?ref=PLACEHOLDER
Reply