Watermarking Large Language Models(LLM) for Text Attribution

Large language models (LLMs) are becoming increasingly sophisticated, capable of generating human-quality text. This raises concerns about the potential for misuse, such as the creation of deepfakes or the spread of misinformation. One way to address these concerns is to develop methods for watermarking LLMs.

A watermark is a signal that can be embedded into text in a way that is imperceptible to humans but detectable by algorithms. This watermark can be used to identify the source of a piece of text, even if it has been modified.

In the article A Watermark for Large Language Models, the authors propose a method for watermarking LLMs. Their method works by selecting a random set of tokens (words or phrases) to be promoted during generation. The model is then trained to favor these tokens slightly over other tokens that are statistically similar. As a result, the watermarked text will contain a statistically significant excess of the promoted tokens.

This watermark can be detected by statistically analyzing the frequency of the promoted tokens in a piece of text. If the frequency of these tokens is higher than expected, it is likely that the text was generated by a watermarked LLM.

The authors' method is robust to a number of attacks, including attacks that attempt to remove the watermark or to transfer it to another piece of text. This makes it a promising approach for watermarking LLMs and for ensuring the attribution of text generated by these models.

Here are some additional points to consider:

The watermarking method could be used to track the spread of misinformation or propaganda. For example, if a piece of misinformation is found to be watermarked with a particular LLM, it may be possible to identify the source of the misinformation and take steps to counter it.
The watermarking method could also be used to protect the copyright of text generated by LLMs. For example, an author could watermark their work with a unique identifier, which would make it possible to track down unauthorized copies of their work.

How it Works?

Imagine the LLM is a chef creating a delicious dish. Watermarking adds a special ingredient (the watermark) that's invisible to our taste buds (reading the text) but detectable with a lab test (special algorithms).

Here's the recipe:

Hashing the Goods: Before picking the next word, the LLM scrambles the previous word into a secret code (hash). This code is like the seed for a random spice mix.
Spice it Up: The secret code is used to split the LLM's vocabulary into two bowls: the green list (good stuff) and the red list (everything else). 🟢
Green Thumb: Words from the green list get a special boost, like adding a sprinkle of magic seasoning. This makes them more likely to be picked as the next word, especially when there are many options (high entropy).
Simmering the Text: Finally, the LLM simmers the words together, considering both the seasoning and the overall taste (probability of each word). The best word is then served up (sampled) as the next part of the text.

Catching the Secret

Anyone can cook up the same green and red lists if they know the secret code (hash function). This lets them analyze the text and see how much green seasoning was used. By comparing this to how much seasoning is expected by chance, they can determine if the LLM likely cooked up the text. ️

Why Watermark?

Think of watermarking as a secret handshake between the LLM and the text. It helps track where the text came from, like marking your property to prevent theft. This is especially useful for things like:

Spotting Fake News: If fake news is found with a particular LLM's watermark, tracing the source becomes easier.
Copyright Protection: Authors can watermark their creations to identify unauthorized copies. ✍️

Is This the Future?

As a result, we don't affect the quality of the generated text. Watermarking is a great tool for regulatory purposes, do you think that we'll soon see laws mandating watermarking of AI generated content?

Want to Learn More?

Check out the resources below to deep dive into the world of LLM watermarking!