[TRANSLATE-4400] OpenAI GPT Custom Instructions: Data Model - translate5 JIRA issue tracker

Details

Type: Sub-task
Resolution: Unresolved
Fix Version/s: None
Affects Version/s: None
Component/s: openai

Urgency:
High
ChangeLog Description:
OpenAI: Data-Model for the Custom Instruction Management
Checklist:

Empty

show more show less

Description

OpenAI GPT Internal Prompts: Data Model

Base for the Internal Prompt Management are the used internal instructions, which will be collected as a seperate JSON file to be able to add descriptions. The used system-messages for e.g. translation are built from smaller instructions - symbolized by a {key}, that are built to bigger instructions/sentences, which have keys to be identified in the completion

Therefore the instructions.json will consist of a list of key-value pairs with description. One speciality of the used keys is, that they can contain quotes that then will be added in the sentence/formulation as triple quotes.

The editing frontend must include the possible to test the manipulated instructions with a set of given segments with terminology and context-data. When a new instruction-set is added, the base is always a copy of the original instructions. When a customzed instruction-set is loaded for editing, any instructions, which are not found in "instructions.json" are dismissed, any new instructions not present in the customized instruction-set will automatically be added with their default-value. So we ensure an easy upgradability of the customized instructions if new instructions are added to the codebase - only the "instructions.json" always must be in-sync with the codebase.

Example for instructions from the code:

    'fromTo' => 'from {from} to {to}',
    'fullFromTo' => 'from {from} as source language to {to} as target language',
    'termpair' => 'use the specific translation delimited by {*delimiter*} {*termpair*}',

The data in "instructions.json":

[
   {
      "key":"fullFromTo",
      "instruction":"from {from} as source language to {to} as target language",
      "description":"This is the precise/full instruction that tells the GPT model which languages to expect for source and translate into"
   },
   {
      "key":"termpair",
      "instruction":"use the specific translation delimited by {*delimiter*} {*termpair*}",
      "description":"This is the instruction that tells the GPT model about a single termpair to be used when translating a segment/text. {*delimiter*} will be resolved to \"triple asterisk\",  {*termpair*} will be resolved to \"***sourceterm*** = ***target term***\""
   }
]

The json for the customized instructionset:

{
    "fromTo" => "from {from} to {to} but customized",
    "fullFromTo" => "from {from} as source language to {to} as target language but customized",
    "termpair" => "use the given termpair delimited by {*delimiter*} {*termpair*}",
}

New DB Datamodel

LEK_openai_instructionset

    columns: ( id | name | comment | json | created | lastChange )
    json: {
        "fromTo" => "from {from} to {to} but customized",
        ...
    }

LEK_openai_instructionset_assoc

Holds the 1:1 association between the instruction-set and a language-resource.

   columns: ( id | instructionSetId | languageResourceId )

Attachments

Activity

People

Assignee:: Pavel Perminov

Reporter:: Axel Becher

Peer developer:: Axel Becher

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 22/Jan/2025 10:03

Updated:: 06/Feb/2025 07:07