Details
-
Task
-
Resolution: Fixed
-
Critical
-
-
The previous local pdf converter is now reachable as a service via network.
-
-
Description
problem
Currently pdf2htmlex is called locally via direct docker call. This will not be possible in the future anymore.
solution
Extend the pdf2htmlex docker container, install a suitable PHP (no apache), implement with ratchet a simple REST webservice which:
- receives new "jobs" (a PDF to be converted is to be ment as a job here) via POST
- The uuid must be provided already in the POST request URL so that horizontal scaling with consistent hash algorithm would be possible
- save such job on the disk in a unique folder name (uuid), return HTTP 201
- /new/UNIQUE_NAME
- /run/UNIQUE_NAME
- /done/UNIQUE_NAME
- So on GET we just need three if(is_dir) calls to get the status and data
- A simple rename is sufficient for moving the folders
- Since ratchet processes things in a loop, the same loop checks if there are unprocessed folders and processes the content
- GET 202 if not processed yet
- GET 200 with the content as zip if processed
- Each GET should only check if the folder is processed - to keep latency lowdue the polling
- DELETE from translate5 after receiving the content there
- automatic garbage collection with deletes folders older as a week (to keep the data for debugging some days)
In Translate5:
- Change the pdf2htmlex command config to a network URL
- Change the worker currently doing the direct call to do a network call with polling, see the above statuses
PDF unite and pdf optimize
The pdf unite and optimize process should also be offloaded to a separate docker.
The same PHP server implementation could be used, just calling the PDF optimizers instead of pdf2htmlex
two step implementation
Basically PDF merging and optimazition should be separated from PDF2htmlex.
But in a first step this can be done IN the same container, then there would be no need to wrap the merge process in translate5 with a network polling worker wrapper.
The need of separation comes just from beeing able to update the PDF tools, which is probably not possible in the "old" pdf2htmlex container.
Attachments
Issue Links
- blocks
-
TRANSLATE-2185 Prepare translate5 for usage with docker
- Done
- mentioned in
-
Page Loading...