XML

Word

Printable

Type: Bug
Resolution: Unresolved
Fix Version/s: None
Affects Version/s: None
Component/s: Auto-QA

Urgency:
Critical
ChangeLog Description:
Improved performance of duplicate segment detection by optimizing queries on large segment datasets. Future updates may further refine indexing and logic to reduce load.
Checklist:

Empty

show more show less

Problem Description:

The query that checks for duplicate/variant target segments in \editor_Segment_Consistent_Check::getInconsistentSegmentNrsInTask on segment views may run slowly.

SELECT  
  GROUP_CONCAT(`segmentNrInTask`) AS `nrs`,
  MAX(DISTINCT `editable`) AS `atLeastOneEditable`,
  MAX(DISTINCT `autoStateId` = 3) AS `atLeastOneLocked`
  FROM `LEK_segment_view_1313b8514edd64be348cd80384eb2dfb`  
  WHERE `targetEditToSort` != ''  
    AND `sourceToSort` != ''
    AND `autoStateId` != '16'
  GROUP BY BINARY `sourceToSort`
    HAVING COUNT(DISTINCT BINARY `targetEditToSort`) > 1
      AND (`atLeastOneLocked` = 0 OR `atLeastOneEditable` = 1);

Execution plan shows:

Full table scan over ~5.3k rows (type = ALL).

No indexes are being used (possible_keys = NULL).

Filtering conditions include sourceToSort != '', targetEditToSort != '', and autoStateId != 16.

Grouping on BINARY targetEditToSort and HAVING COUNT(DISTINCT BINARY sourceToSort) > 1 requires temporary tables and sorting (Using filesort).

Impact:

Query performance degrades significantly on large datasets (millions of segments).

Duplicate detection becomes a bottleneck for workflows that rely on this validation.

Notes:

Because both sourceToSort and targetEditToSort are LONGTEXT, indexing them directly is not feasible.

Partial/functional indexes (e.g., on targetEditToSort(255) or on hash columns like targetMd5, sourceMd5) could help.

Query may require refactoring to leverage existing Md5 columns instead of operating directly on LONGTEXT fields.

Alternative: precompute normalized sort fields in shorter indexed columns.

Next Steps:

Evaluate feasibility of replacing GROUP BY BINARY targetEditToSort with grouping by targetMd5.

Add indexes on autoStateId, editable, and possibly (autoStateId, targetMd5) to filter and group more efficiently.

Test performance improvements with real-world data volumes.

If indexing alone is insufficient, redesign duplicate detection logic to avoid full scans and filesort.

Assignee:: Pavel Perminov
Reporter:: Thomas Lauria
Peer developer:: Axel Becher, Thomas Lauria
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: 29/Sep/2025 12:54
Updated:: 09/Oct/2025 05:28

Details

Description

Problem Description:

Attachments

Activity

People

Dates