That little chatbot on your favorite website might be hiding something...
Inspired by Anthropic's paper - "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training", we created a friendly chatbot that stores a secret and will only reveal it when it believes certain conditions or questions have been asked.
Steganography is defined as the art of concealing information within other data. We are going to be hiding a secret in a LLM's weights by fine tuning it.
Anthropic's paper uses an example of a LLM trained in such a way that produces malicious code when it believes the year is 2024 and safe code when the year is 2023.
We thought of extending this idea to create an inconspicuous chatbot that will reveal a secret when presented with the right conditions.
The Homepage

Nothing special here 😉

You can visit Chatbot7 - a sample chat app setup to showcase our LLM. (Note: It might take upto a minute to load the app if it hasn't been used for a while 🙂)
Before this project, I really hadn't worked on fine tuning LLMs; so I wanted to see how we can fine tune an LLM and maybe even build a fine-tuning pipeline!
We also had to come up with a cool novel idea for our information security project. 😜
Users can go to any website that uses a chatbot that is running StegLLM.
The chatbot will reply just like an ordinary LLM until a specific phrase is sent or a condition is established.
If the phrase is sent, the chatbot will reply with the secret message.
This is already used in LLMs to watermark models. This is especially helpful for models that might be open source but don't allow for commercial use.
In such cases, the owner of the model can detect that their model is being used unfairly
The StegLLM platform processes the transcript and generates a report which can be downloaded or shared with other stakeholders.
StegLLM also provides an option for users to receive a general report of a meeting right after it's over, directly in their email inbox. This option can be toggled on the StegLLM dashboard.
Just your regular ordinary chatbot

Secret exposed!

Fine tuned! 👷

Currently, dataset creation, fine-tuning, and deploying is done manually. It would be super cool and helpful if we created a pipeline that streamlines all these processes. (maybe keep it open source?)
I learned a lot about fine-tuning while building StegLLM. It's really awesome and I look forward to using it in future projects!