Details
-
Sub-task
-
Resolution: Unresolved
-
None
-
None
-
High
-
added database tables to store predefined prompts data
-
Empty show more show less
Description
Enhanced OpenAI training Data-Model
Currently LEK_openai_finetunejob is the only data-table, this must be changed to seperate system-msgs from exmples from finetune-jobs. We still capture the training as a file though for reference.
My idea is to NOT to save examples nor system-messages line-by-line but to keep the examples in a "holistic" model as I think there will be no need to have the single examples as individual entities. Searching can be accomplished with MySQLs JSON functions when needed and performance is not a relevant factor here. All created & lastChange fields are PHP-timestamps
Added tables:
LEK_openai_sysmessage
Holds the system-messages, the field "lang" will always be "en" currently. The json may includes one or several sys-messages, which may consist of several sentences each
columns: ( id | lang | json | name | comment | created | lastChange ) json: [ { "message": "Just an example system message" } ... ]
LEK_openai_exampleset
Holds the examples as source and target strings. 1:n connection to LEK_openai_sysmessage. Source and target language generally can be with and/or without country. The "isComplete" flag is calculated after edit and represents, if all source-texts are translated. For a training, only translated lines are used.
columns: ( id | sysMessageId | sourceLang | targetLang | json | comment | created | lastChange | isComplete ) json: [ { "source": "This is example 1", "target" : "Das ist Beispiel 1" } { "source": "This is example 2", "target" : "Das ist Beispiel 2" } ... ]
LEK_openai_finetunejob
column "conversation" will NOT be replaced with the associated sys-messages, because 'conversation' column will contain sysmessages and examples used at the point of time where training happened so 'conversation' will contain the history, and the sysmessages and examples in LEK_openai_sysmessage and LEK_openai_exampleset tables, respectively - might evolve and improve over time. So, each time training is submitted - the data from prompts which are added to training - is converted to 'conversation' and stored there.