Details

    • Sub-task
    • Resolution: Unresolved
    • None
    • None
    • openai
    • High
    • added database tables to store predefined prompts data

    Description

      Enhanced OpenAI training Data-Model

      Currently LEK_openai_finetunejob is the only data-table, this must be changed to seperate system-msgs from exmples from finetune-jobs. We still capture the training as a file though for reference.
      My idea is to NOT to save examples nor system-messages line-by-line but to keep the examples in a "holistic" model as I think there will be no need to have the single examples as individual entities. Searching can be accomplished with MySQLs JSON functions when needed and performance is not a relevant factor here. All created & lastChange fields are PHP-timestamps

      Added tables:

       

      LEK_openai_sysmessage

      Holds the system-messages, the field "lang" will always be "en" currently. The json may includes one or several sys-messages, which may consist of several sentences each

       

          columns: ( id | lang | json | name | comment | created | lastChange )
          json: [
              { "message": "Just an example system message" }
               ...
          ]
      

       

      LEK_openai_exampleset

      Holds the examples as source and target strings. 1:n connection to LEK_openai_sysmessage. Source and target language generally can be with and/or without country. The "isComplete" flag is calculated after edit and represents, if all source-texts are translated. For a training, only translated lines are used.

       

          columns: ( id | sysMessageId | sourceLang | targetLang | json | comment | created | lastChange | isComplete )
          json: [
              { "source": "This is example 1", "target" : "Das ist Beispiel 1" }
              { "source": "This is example 2", "target" : "Das ist Beispiel 2" }
              ...
          ]
      

              
      LEK_openai_finetunejob
              
      column "conversation" will NOT be replaced with the associated sys-messages, because 'conversation' column will contain sysmessages and examples used at the point of time where training happened so 'conversation' will contain the history, and the sysmessages and examples in LEK_openai_sysmessage and LEK_openai_exampleset tables, respectively - might evolve and improve over time. So, each time training is submitted - the data from prompts which are added to training - is converted to 'conversation' and stored there.

      Attachments

        Activity

          People

            pavelperminov Pavel Perminov
            axelbecher Axel Becher
            Axel Becher
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: