Agentic Translation with validation
In the previous post, I wrote about the LLM for translation. It was the most basic way to use the model to translate the text. However, the translation quality is not always good. In this post, I will try to use the model to generate the translation and validate it with a judge using SocietyOfMindAgent with AutoGen.
Autogen
Autogen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans creating by Microsoft. In this article, I will use the framework to create a translation service with a judge to validate the translation quality using the SocietyOfMindAgent.
SocietyOfMindAgent
Deepmind proposed in its paper Improving Factuality and Reasoning in Language Models through Multiagent Debate the “society of mind” approach inspired by Marvin Minsky’s theory of the same name. This approach is also called language generation through multi-agent debate.
The society of mind is a collection of agents that work together to solve a problem. Each agent has its own expertise and can communicate with other agents to solve the problem.
Example from Autogen’s documentation:
|
|
In this example, the society of mind agent holds two agents: assistant1(writer) and assistant2(editor). The two assistants are the basic text generation agents.
A round robin group chat is used to switch between the two agents. A termination message is added to the group chat to stop the conversation when the editor approves the text. If there is no termination message defined, the conversation will continue indefinitely. This group chat is what we call the team that used by SocietyOfMindAgent.
The society of mind agent is in fact an agent that process all the discussions generated by the group chat. And it comes with an instruction to wrap the discussion with a response prompt to generate a response based on the discussion.
The default instruction is:
|
|
The default response prompt is:
|
|
Finally the the society of mind agent is added to the group chat as a single agent to generate the final response.
To summary, the society of mind is the representative of a group of agents to generate the final response.
The experiment
My case is a lot simpler than the example above. I just need to translate a text and make it validate by a judge. It could be a multi-turn translate and refine iteration. If the judge approves the translation, the conversation stops and the society of mind agent will generate the final response which is the translation.
|
|
The above code is a simple translation service with a judge. The translation team holds two agents: translator and translator_reviewer. The conversation will stop when the reviewer approves the translation. The society of mind agent will generate the final response which is the translation.
My model is way smaller than the example. At the beginning, I used the Qwen2.5 1.5B model to translate the text. It works well most of the time. However, the translation is not consistent all the time and it will add some extra info that doesn’t exist in the original text. Much worse, the judge failed to spot the problem and approve the wrong translation. The 3B model failed to achieve the consistency and the judge process. At the end, I stopped the experiment at the 7B model. The model can generate the translation with high quality and the judge can spot the problem and approve the translation.
However, it works as expected with minor changes. I changed the response prompt to only output the translation without mentioning the intermediate discussion and the APPROVE message. With the original response prompt, the final response contains the “APPROVE” message.
The final response is:
|
|
A second example:
|
|
Final thoughts
I had a lot of fun with this agent. In this very short exploration, I didn’t make sure the the society of mind at the end can generate a consistent response but only the inner team.
The agent is somehow overkill for this use case. Most of the time, we can expect the model to output very good translation even with the 1.5B model. The judge can be more useful if the we provide much more context to the translation: for example, the domain of the text, the target audience, etc.
There are still a lot of things to explore with the agent. For example, I can add more agents to the society of mind agent to generate the final response. Or I can make the judge to vote instead of approve the translation to make the final translation more reliable.
This kind of generation by debate is very powerful. There are apps with a whole IT company built with agents this way like ChatDev. It is a very interesting approach to generate the final response.
I will try to make a more complex agent in the future with other use cases.