Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3436

Integrate GPT-4 with translate5 as translation engine

Details

    • Critical
    • To update to this version PHP 8.1.23 is required.
    • New Private Plugin "OpenAI" to use OpenAI-Models as language-resource and base functionality to fine-tune these models

    Description

      The goal is to get translations from GPT-4 in the way we are getting them from other MT services.

      Yet different to other MT resources, we should find out, how it works to transfer additional information to GPT-4, what is the best way for doing so and if this leads to a better translation.

      The following sources should be first evaluated and then if possible (what it should be) integrated for providing GPT-4 with additional information. If those sources are used, should be configurable.

      • Send as many segments in one request as possible and tell GPT-4, that the structure of the segments need to be respected, but that the segments build a context. This will save costs, because GPT is payed by request and not by chars. And it will lead probably to a better translation.
      • For each segment, where we have a 100% match or better, provide it to GPT and tell it, that we have already a translation for this and that it should use it as an inspiration for the other segments.
      • For each segment send GPT the best X fuzzy matches in a structured way. We need to play around here, what makes sense. I would guess, it makes sense to send all fuzzy matches up to a certain percentage, e. g. 70%. And in addition send the best 3 fuzzy matches, if 3 do not exist above 69%.
      • For each segment we mark in the source the found terminology and tell GPT how it should be translated (please see TextShuttle-Plug-in-Code, which also does this).
      • With less prio than the previous: Find out, if we already can send images as context information to GPT-4. If yes: Implement it, that images are send as context.
      • With less prio than the previous: Find out, if we can provide GPT-4 with the complete available TMs before the translation starts to train it. Or at least a certain number of matches.

      More thoughts:

      • send combined segments in a discernable structure to reduce costs
      • send terminology markup and define the term's
      • request alternatives for words
      • Add frontend GUI to request segment phrasing changes e.g. regarding gender

       

      Attachments

        Issue Links

          Activity

            Loading...
            Uploaded image for project: 'translate5'
            1. translate5
            2. TRANSLATE-3436

            Integrate GPT-4 with translate5 as translation engine

            Details

              • Critical
              • To update to this version PHP 8.1.23 is required.
              • New Private Plugin "OpenAI" to use OpenAI-Models as language-resource and base functionality to fine-tune these models

              Description

                The goal is to get translations from GPT-4 in the way we are getting them from other MT services.

                Yet different to other MT resources, we should find out, how it works to transfer additional information to GPT-4, what is the best way for doing so and if this leads to a better translation.

                The following sources should be first evaluated and then if possible (what it should be) integrated for providing GPT-4 with additional information. If those sources are used, should be configurable.

                • Send as many segments in one request as possible and tell GPT-4, that the structure of the segments need to be respected, but that the segments build a context. This will save costs, because GPT is payed by request and not by chars. And it will lead probably to a better translation.
                • For each segment, where we have a 100% match or better, provide it to GPT and tell it, that we have already a translation for this and that it should use it as an inspiration for the other segments.
                • For each segment send GPT the best X fuzzy matches in a structured way. We need to play around here, what makes sense. I would guess, it makes sense to send all fuzzy matches up to a certain percentage, e. g. 70%. And in addition send the best 3 fuzzy matches, if 3 do not exist above 69%.
                • For each segment we mark in the source the found terminology and tell GPT how it should be translated (please see TextShuttle-Plug-in-Code, which also does this).
                • With less prio than the previous: Find out, if we already can send images as context information to GPT-4. If yes: Implement it, that images are send as context.
                • With less prio than the previous: Find out, if we can provide GPT-4 with the complete available TMs before the translation starts to train it. Or at least a certain number of matches.

                More thoughts:

                • send combined segments in a discernable structure to reduce costs
                • send terminology markup and define the term's
                • request alternatives for words
                • Add frontend GUI to request segment phrasing changes e.g. regarding gender

                 

                Attachments

                  Issue Links

                    Activity

                      People

                        axelbecher Axel Becher
                        marcmittag Marc Mittag [Administrator]
                        Aleksandar Mitrev
                        Votes:
                        0 Vote for this issue
                        Watchers:
                        2 Start watching this issue

                        Dates

                          Created:
                          Updated:
                          Resolved:

                          People

                            axelbecher Axel Becher
                            marcmittag Marc Mittag [Administrator]
                            Aleksandar Mitrev
                            Votes:
                            0 Vote for this issue
                            Watchers:
                            2 Start watching this issue

                            Dates

                              Created:
                              Updated:
                              Resolved: