StegLLM - A Novel Concept for steganography using LLMs

That little chatbot on your favorite website might be hiding something...

What is StegLLM?

Inspired by Anthropic's paper - "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training", we created a friendly chatbot that stores a secret and will only reveal it when it believes certain conditions or questions have been asked.

The Homepage

Nothing special here 😉

How can I use it?

You can visit Chatbot7 - a sample chat app setup to showcase our LLM. (Note: It might take upto a minute to load the app if it hasn't been used for a while 🙂)

Why Build StegLLM?

How does it work?

  1. Users can go to any website that uses a chatbot that is running StegLLM.

  2. The chatbot will reply just like an ordinary LLM until a specific phrase is sent or a condition is established.

  3. If the phrase is sent, the chatbot will reply with the secret message.

  4. This is already used in LLMs to watermark models. This is especially helpful for models that might be open source but don't allow for commercial use.

  5. In such cases, the owner of the model can detect that their model is being used unfairly

  6. The StegLLM platform processes the transcript and generates a report which can be downloaded or shared with other stakeholders.

  7. StegLLM also provides an option for users to receive a general report of a meeting right after it's over, directly in their email inbox. This option can be toggled on the StegLLM dashboard.

Just your regular ordinary chatbot

Secret exposed!

How was it built?

  1. We first take a secret key/phrase/condition and a message the LLM should respond with.
  2. A small sample dataset is then created using the input. For the proof of concept, we went with simply repeating the "key:message" a couple of times. More advanced methods to create a dataset can include using another LLM to generate a sample dataset with the "key:message" input.
  3. A base model, eg: gemma-2b, llama3.2, is fine tuned with the dataset. We used Unsloth, which makes fine-tuning large language models like Llama-3, Mistral, Phi-4 and Gemma 2x faster, use 70% less memory, a lot easier and with no degradation in accuracy (pretty cool!).
  4. The LLM is then deployed and used via a chatbot web application built using Flask.

Fine tuned! 👷

How can it be better?

Currently, dataset creation, fine-tuning, and deploying is done manually. It would be super cool and helpful if we created a pipeline that streamlines all these processes. (maybe keep it open source?)

How was my experience?

I learned a lot about fine-tuning while building StegLLM. It's really awesome and I look forward to using it in future projects!