-
Type:
New Feature
-
Resolution: Unresolved
-
None
-
Affects Version/s: None
-
Component/s: translate5 AI
-
High
-
Emptyshow more show less
Problem
For translation, terminology RAG, quality estimation there are a lot of base prompts hard-coded in the system, that currently PMs/users can not change. But they may need that to enhance results in some cases.
Solution
Implement a possibility to overwrite the base-prompts by language resource.
Implementation Details
Overwriting of System Instructions
a) Extract the Instructions / Completion-texts to a central file in application/modules/editor/Plugins/OpenAI/data/system-instructions.json. Not all instructions must be extracted, only those that are worth overriding. Some even should not be extracted like 'fromTo', 'fullFromTo'. These extracted Instructions must have a short technical description describing in which cases for what purpose they are used.
Each completion will have an ID/identifier added: FineTuning, FineTuningExample, FineTuningResource, QualityScoreEstimation, Translation, TranslationWithXliff
Since the Instructions are from several completions (purposes), a first level represents the purpose (completion-ID)
{
"FineTuning" = { ... }
"FineTuningExample" = { ... }
"FineTuningResource" = { ... }
"QualityScoreEstimation" = { ... }
"Translate" = { ... }
"TranslateWithXliff" = { ... }
}
b) Add instruction-expanding for the placeholders / base variables in the instructions in a additional, better readable form. Prompts and overwritten Instructions should generally support expanding these variables at runtime:
Expandable variables / placeholder aliases usable in Prompts and overwritten System Prompts
'from' => 't5-source-language', // e.g. "de-DE / german (Germany)"
'fromIso' => 't5-source-language-iso', // e.g. "de-DE"
'fromName' => 't5-source-language-name', // e.g. "german (Germany)"
'to' => 't5-target-language', // e.g. "en-US / english (USA)"
'toIso' => 't5-source-language-iso', // e.g. "en-US"
'toName' => 't5-source-language-name', // e.g. "english (USA)"
'fromTo' => 't5-source-to-target-iso', // e.g. "from “de-DE” into “en-US”"
'fullFromTo' => 't5-source-to-target-full', // e.g. "from “de-DE / german (Germany)” as source language into “en-US / english (USA)” as target language"
'sectionCount' => 't5-section-count', // needed when writing prompts in Markdown style using numbered sections
c) Add a new icon (ask axelbecher) to the language-resource management (of LLM resources) that will open the "System Instruction Overrides" for that resource. In this window the system-prompts will be loaded into a grid:
"Purpose" (originating completion), "Identifier" (key in completion), "description", "instruction" (instruction is editable field).
- In "description" the desription of a) is loaded (No localization neccessary as the prompts are in english anyway)
- The "instruction" is prefilled with the original Instruction
- The "instruction" will only be saved as override, if it is different from the original instruction
- Any placeholders/expandable variables in the changed instructions will be verified and the saving fails, if unsupported placeholders were provided
- if internal placeholders like "
{from}
" were used they will be replaced with the "official" ones without further notice
d) Data-Model is a new table "LEK_openai_instruction_overrides" with the columns:
"languageResourceId" => ID of the language-resource to override
"completionId" => holding the completion-identifier,
"instructionId" => holding the instruction-identifier,
"instruction" => textfield holding the overridden instruction