WTF is prompt engineering?

When talking to newsroom leaders about their experiments with generative AI, a new term has cropped up: prompt engineering.

While it sounds technical, prompt engineering — also sometimes referred to as prompt design or prompt construction — is the main way to communicate with a large language model (LLM), or the systems pre-trained on large datasets that power generative AI. Prompt engineering is necessary for most interactions with LLMs, especially for publishers developing specific chatbots and quizzes. Often, programming an LLM occurs in a chat interface, like OpenAI’s generative AI chatbot ChatGPT.

WTF is prompt engineering?

The traditional form of programming software is to write computer code. For LLMs, programming is in the form of writing directives, or prompts, said Francesco Marconi, a computational journalist and co-founder of real-time information company Applied XL.

Prompts are used to instruct and control the model, by describing the task that it should perform and what it should generate as its output — all using natural, human language, according to Nicholas Diakopoulos, professor at Northwestern University and director of the Computational Journalism Lab.

The true skill in prompt engineering is writing prompts that get the desired output from the LLM, which can be trained on a publisher’s archives, for example. It can also be directed to provide outputs with a certain tone, for specific audiences. The output is only as good as the prompts.

For example, publishers that have created their own chatbots are using models trained exclusively on their content. The personalities some of them are given are determined by prompt engineering, which instructs the model on what kind of tone it should use in its output.

“The way AI works is you code something that’s a little bit like a digital brain,” said Harry Hope, CTO at Insider. “We can basically train a computer program and have it learn almost like a biological brain can learn, which to me is mind blowing.”

However, like a real brain, it’s difficult to determine how the AI models work, Hope said. “That means that interacting with AI models is a little bit more of trial and error. It feels more like biology than it does mathematics, because we don’t really know how this thing is working. So what we do is we talk to it through prompts, and you observe the outcomes. And you can measure how different things that you input into the system come out with outputs.”

Who creates these prompts?

Because communicating with LLMs doesn’t require a coding background, editorial employees are also part of the prompt engineering process, working in tandem with data scientists and engineers. For the most part, publishers aren’t hiring roles specifically for this task yet. Journalists are inherently good at asking questions, so it makes sense that they would be part of prompt design, Marconi said.

However, prompt creation can be a time-consuming process. “There’s oftentimes a lot of iteration and you need to refine things, try different ways of expressing your idea to try to get the model to understand what you want it to do,” Diakopoulos said.

BuzzFeed is training a “smattering” of writers on how to use the AI technology to improve the “very manual process” of quiz creation, Jess Probus, svp of BuzzFeed editorial, said in March. Those writers test different prompts for the AI tool to ensure its responses have BuzzFeed’s “personality,” she said. 

“The hurdle for people is understanding the language of it… and how to write good prompts,” Probus said. 

The prompt engineering process is “mostly trial and error,” said Josh Jaffe, Ingenio’s president of media, in an email. “It’s like painting a picture. You start with a sketch and then fill in the form, shadow and colors. You step back from time to time to see if it’s what you want and then continue working on it. Eventually, you decide it’s done.”

What are some examples of prompts?

Diakopoulos runs a site called Generative AI Newsroom, which publishes examples of case studies on prompting strategies. One of those examples looks at how to create news quizzes using GPT-4, the AI technology ChatGPT is built on and is available for outside developers. 

Here’s an example of a prompt used for that task: “Create a multiple-choice news quiz question based on the following news article about ${summary}. Ensure that the question is directly relevant to the main takeaway of the article, clearly written, and easy to understand. Provide four answer choices, with the correct answer clearly distinguishable from the rest based on the information in the article. Specify what the correct answer is and provide a brief explanation why the corresponding option is correct by referencing facts from the article. Write the explanation as if you were a journalist explaining the news to the public. ### ${article}”

Many of those interviewed for this story said prompts perform best — with more accurate outputs — when they are broken down into smaller sub-tasks, before combined into a larger directive.

For example, a journalist may want to test rewriting an article using generative AI, without having the quotes in the article tampered with. Instead of asking the LLM to rewrite the story without rewriting the quotes — which it might not understand — a better outcome would come from asking the LLM to extract the quotes from the article, check that it performed that step accurately, and then put in a second prompt asking the LLM to rewrite the article and keep those quotes that it extracted untouched, said Diakopoulos.

How are prompts stored?

With all the work that goes into constructing prompts, it’s important to keep a log of prompts that have been tested, sample outputs and a scoring rubric to determine which prompts performed best, Marconi said. Sometimes, these logs are called “prompt libraries.”

Marconi compared it to having analytics on the performance of a newsroom’s articles, or an editorial stylebook. It can also serve as a template for others at the company, Hope said.

“It’s all about observations, and measuring and seeing which results have better outcomes,” Hope said.

However, logging prompts remains a manual and informal process within newsrooms. Some publishers are saving them in internal documents or Slack threads, Diakopoulos said.

Why are publishers so secretive about their prompts?

Digiday asked a number of publishers for examples of prompts they’ve constructed, but so far none have been willing to pull back the curtain.

Probus told Digiday that BuzzFeed’s prompts are “part of our secret sauce.”

Jaffe likened it to Coca Cola giving away their recipe for Coke. “It’s the means of differentiating your content or chat experience from all the others. It’s how you put your fingerprint on what users consume.”

https://digiday.com/?p=512791

More in Media

Media buyers weigh the sledgehammer or the scalpel approach to MFA classification 

MFAs carry a loose definition and media buyers are split on how to go about removing them from their clients’ programmatic budgets.

Why media companies are pushing podcasts at SXSW ‘Trojan-horse style’

Some media companies are putting the spotlight on their podcasts at SXSW this year in a bid to land business from new advertisers.

AI Briefing: Watermarking AI content doesn’t go far enough, researchers warn

Mozilla researchers say popular methods for disclosing and detecting AI content aren’t effective enough to prevent harm. Meanwhile, Pindrop is bringing more scrutiny to AI-generated audio.