Uploaded image for project: 'pdfconverter'
  1. pdfconverter
  2. PDFCON-6

Make pdf converter reachable via network

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • pdf2htmlex

    Description

      problem

      Currently pdf2htmlex is called locally via direct docker call. This will not be possible in the future anymore.

      solution

      Extend the pdf2htmlex docker container, install a suitable PHP (no apache), implement with ratchet a simple REST webservice which:

      • receives new "jobs" (a PDF to be converted is to be ment as a job here) via POST
        • The uuid must be provided already in the POST request URL so that horizontal scaling with consistent hash algorithm would be possible
      • save such job on the disk in a unique folder name (uuid), return HTTP 201
        • /new/UNIQUE_NAME
        • /run/UNIQUE_NAME
        • /done/UNIQUE_NAME
        • So on GET we just need three if(is_dir) calls to get the status and data
        • A simple rename is sufficient for moving the folders
      • Since ratchet processes things in a loop, the same loop checks if there are unprocessed folders and processes the content
      • GET 202 if not processed yet
      • GET 200 with the content as zip if processed
        • Each GET should only check if the folder is processed - to keep latency lowdue the polling
      • DELETE from translate5 after receiving the content there
      • automatic garbage collection with deletes folders older as a week (to keep the data for debugging some days)

      In Translate5:

      • Change the pdf2htmlex command config to a network URL
      • Change the worker currently doing the direct call to do a network call with polling, see the above statuses

      PDF unite and pdf optimize

      The pdf unite and optimize process should also be offloaded to a separate docker.

      The same PHP server implementation could be used, just calling the PDF optimizers instead of pdf2htmlex

      two step implementation

      Basically PDF merging and optimazition should be separated from PDF2htmlex.

      But in a first step this can be done IN the same container, then there would be no need to wrap the merge process in translate5 with a network polling worker wrapper.

      The need of separation comes just from beeing able to update the PDF tools, which is probably not possible in the "old" pdf2htmlex container.

       

      Attachments

        Issue Links

          Activity

            People

              leonkiz Leon Kiz
              tlauria Thomas Lauria
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: