[TRANSLATE-3436] Integrate GPT-4 with translate5 as translation engine - translate5 JIRA issue tracker

Details

Type: New Feature
Resolution: Fixed
Fix Version/s: translate5 - 7.0.0
Affects Version/s: None
Component/s: LanguageResources

Urgency:
Critical
Important release notes:
To update to this version PHP 8.1.23 is required.
ChangeLog Description:
New Private Plugin "OpenAI" to use OpenAI-Models as language-resource and base functionality to fine-tune these models
Checklist:

Empty

show more show less

Description

The goal is to get translations from GPT-4 in the way we are getting them from other MT services.

Yet different to other MT resources, we should find out, how it works to transfer additional information to GPT-4, what is the best way for doing so and if this leads to a better translation.

The following sources should be first evaluated and then if possible (what it should be) integrated for providing GPT-4 with additional information. If those sources are used, should be configurable.

Send as many segments in one request as possible and tell GPT-4, that the structure of the segments need to be respected, but that the segments build a context. This will save costs, because GPT is payed by request and not by chars. And it will lead probably to a better translation.
For each segment, where we have a 100% match or better, provide it to GPT and tell it, that we have already a translation for this and that it should use it as an inspiration for the other segments.
For each segment send GPT the best X fuzzy matches in a structured way. We need to play around here, what makes sense. I would guess, it makes sense to send all fuzzy matches up to a certain percentage, e. g. 70%. And in addition send the best 3 fuzzy matches, if 3 do not exist above 69%.
For each segment we mark in the source the found terminology and tell GPT how it should be translated (please see TextShuttle-Plug-in-Code, which also does this).
With less prio than the previous: Find out, if we already can send images as context information to GPT-4. If yes: Implement it, that images are send as context.
With less prio than the previous: Find out, if we can provide GPT-4 with the complete available TMs before the translation starts to train it. Or at least a certain number of matches.

More thoughts:

send combined segments in a discernable structure to reduce costs
send terminology markup and define the term's
request alternatives for words
Add frontend GUI to request segment phrasing changes e.g. regarding gender

Attachments

Issue Links

is blocked by

TRANSLATE-2782 PHP 8.1 Compability

Done

mentioned in: Page Loading...

Sub-Tasks

1.

Train GPT with termCollections and TMs

Done

Axel Becher

Activity

Loading...