• The AI Pulse
  • Posts
  • šŸ§  ChatGPTā€™s Creativity Is Overrated

šŸ§  ChatGPTā€™s Creativity Is Overrated

PLUS: 3 Obstacles Preventing Chatbots From Matching Human Creativity

Welcome back AI prodigies!

In todayā€™s Sunday Special:

  • šŸŽØIs Generative AI Creative?

  • šŸ¦¾GPT-4 vs. College Students

  • šŸš§Three Obstacles

  • šŸ”‘Key Takeaway

Read Time: 6 minutes

šŸŽ“Key Terms

  • Torrance Tests for Creative Thinking: a creative assessment that tests convergent and divergent thinking through various verbal and nonverbal tasks.

  • Retrieval-Augmented Generation (RAG): a framework designed to make language models more reliable and accurate by pulling relevant, up-to-date data directly related to a userā€™s query from a source (e.g., a scientific journal or news article).

šŸŽØIS GENERATIVE AI CREATIVE?

Some researchers think so. Last year, a professor pitted GPT-4 against college students in the Torrance Tests for Creative Thinking, the most widely used creativity assessment. Before we share the results and their potential implications, letā€™s define creativity: It requires both novelty and utility. It combines existing things in a new and helpful way or produces entirely new things that serve a purpose. But thereā€™s something abstract, perhaps even magical, about how we create novel ideas. Weā€™ve all experienced the ā€œAha!ā€ moment, but discerning where it came from and how to replicate that process is nearly impossible.

šŸ¦¾GPT-4 VS. COLLEGE STUDENTS

The Torrance Tests contain three sections, each with a myriad of challenges. Each task has a time limit based on age, test objective, and other factors.

  • Verbal Tasks Using Verbal Stimuli:

    1. Impossibilities: List as many impossibilities as possible.

    2. Just Suppose: Confronted with an unlikely scenario, subjects must predict potential outcomes. New variables will be introduced throughout the exercise to influence their predictions.

  • Verbal Tasks Using Non-Verbal Stimuli:

    1. Ask and Guess: Ask non-obvious questions about a picture. Hypothesize the causes and effects of the scenario in the picture.

    2. Unusual Uses: Think of the most clever, engaging, and uncommon uses of a toy or any object.

  • Non-verbal Tasks (i.e., Excluded From the GPT-4 vs. College Students Duel):

    1. Circles and Squares: On a page with 42 circles of equal size, sketch objects or pictures that use circles. Repeat for squares.

    2. Incomplete Figures: A page contains ten squares containing a different stimulus drawing. Sketch objects or designs by adding as many lines as possible to the ten figures.

Results are scored based on four categories: fluency, flexibility, originality, and elaboration. Fluency describes the total number of interpretable, meaningful, and relevant ideas generated, and flexibility refers to the number of different categories of appropriate responses. How do you think GPT-4 fared against college students?

šŸ©ŗ PULSE CHECK

What percent of college students did GPT-4 beat in creativity?

Vote Below to View Answer

Login or Subscribe to participate in polls.

šŸš§THREE OBSTACLES

If youā€™ve tinkered with OpenAIā€™s ChatGPT, this shouldnā€™t be too surprising. The AI model read the web, remembered what it read, and somewhat generated the most likely words to follow each prior word. Despite this impressive performance, chatbots have severe limitations, preventing them from replacing humans in any creative situation.

  1. They Canā€™t Apply Creative Output: Letā€™s say you ask OpenAIā€™s ChatGPT to develop 20 names for a clothing business. You have to check if theyā€™re taken and culturally appropriate. Also, the suggestions donā€™t reflect your personal experiences. Chatbot output is, at best, a starting point for more complex questions, like vacation planning. When I asked for a four-week European itinerary, Googleā€™s Gemini failed to include links to accommodations, transportation, and restaurants I specifically asked for. It can describe destinations in flowery language, but it canā€™t help book anything.

  2. They Sacrifice Utility for Accuracy: Often, the most accurate response wonā€™t be actionable for the user. When asked to list former President Trumpā€™s indictments, Googleā€™s Gemini failed to answer, directing users to Google Search. OpenAIā€™s ChatGPT, on the other hand, listed a few of them, excluding some. Although accuracy is essential, no one wants to use a product that doesnā€™t give them the information they want.

  3. They Hallucinate: Although OpenAIā€™s ChatGPT hallucination rate of 3% beats its competitors, hallucinations become much more frequent as queries get more complex. Double-checking outputs via Google Search or external sources is necessary for high-stakes endeavors like school or work. Developers address this through an AI framework called Retrieval-Augmented Generation (RAG). Instead of relying on vast training data to generate a response, RAG-enabled chatbots pull information from smaller, high-quality datasets, like Wikipedia, published research papers, or legal documents. Although incorporating RAG into your chatbot requires technical know-how, manual implementation is also possible. Paste the text you want it to reference into the prompt and ask the chatbot to reference it. Now, the size of your high-quality dataset is limited to the extent of the prompt, but responses should be more accurate.

šŸ”‘KEY TAKEAWAY

Weā€™re still in the earliest innings of conversational AI. Chatbots can only reason by analogyā€”repeating or rewording past writing. They canā€™t generate novel, useful, and feasible ideas, never mind accurate ones. At least not yet. Experts disagree on whether the hallucination problem is solvable. But even with 100% accuracy, chatbotsā€™ creative outputs have severe limitations. In narrow, structured assessments, theyā€™re quick brainstormers. But most problems require a combination of unprogrammable skills and knowledge, for now.

šŸ“’FINAL NOTE

If you found this useful, follow us on Twitter or provide honest feedback below. It helps us improve our content.

How was todayā€™s newsletter?

ā¤ļøAI Pulse Review of The Week

ā€œItā€™s always a great read, with simple and clear sections.ā€

-Tucker (ā­ļøā­ļøā­ļøā­ļøā­ļøNailed it!)

šŸŽNOTION TEMPLATES

šŸšØSubscribe to our newsletter for free and receive these powerful Notion templates:

  • āš™ļø150 ChatGPT prompts for Copywriting

  • āš™ļø325 ChatGPT prompts for Email Marketing

  • šŸ“†Simple Project Management Board

  • ā±Time Tracker

Reply

or to participate.