Preconditions and environment
Steps to reproduce
- Enable async grid indexing:
bin/magento config:set dev/grid/async_indexing 1
- Ensure cron is running (
* * * * * schedule for sales_grid_order_async_insert)
- Create several test orders so the cron has work to do each cycle
- While the cron is mid-execution (processing other orders), manually complete an order in the admin panel (e.g., create invoice + shipment, or change status via the comment form) - you will need to artificially cause the query to take a long time.
- Wait for multiple cron cycles to pass (5+ minutes)
- Check the Sales > Orders grid in admin
Expected result
The order grid should reflect the updated order status (e.g., "Complete") within one or two cron cycles (1-2 minutes).
Actual result
The order grid permanently shows the old status (e.g., "Processing") for the affected order. The stale data persists indefinitely and never self-corrects.
Additional information
Root cause
UpdatedAtListProvider (vendor/magento/module-sales/Model/ResourceModel/Provider/UpdatedAtListProvider.php) uses a LastUpdateTimeCache watermark to optimize its query. The intended query is:
SELECT entity_id FROM sales_order main
INNER JOIN sales_order_grid grid
ON main.entity_id = grid.entity_id
AND main.updated_at > grid.updated_at
But the watermark adds an additional filter:
WHERE main.updated_at > :cached_watermark_timestamp
The race condition:
- Cron starts and queries for unsynced order IDs
- During query execution, an admin completes Order X —
sales_order.updated_at is set to T1
- The cron's query (already in flight) does not see Order X
- Cron finishes processing other orders whose
updated_at values are > T1, advances the watermark past T1
- Every subsequent cron run filters with
WHERE main.updated_at > :watermark — Order X (at T1) fails this check and is skipped
Why it never self-heals:
The watermark is stored in Magento's cache with a 3600-second TTL (LastUpdateTimeCache.php:44-49). However, every cron cycle that processes at least one order calls lastUpdateTimeCache->save() (Grid.php:150), which resets the TTL. On any active store, the cron processes orders every minute, so the cache is perpetually refreshed and never expires. The skipped order is permanently stuck.
UpdatedIdListProvider cannot catch these orders either — it only finds orders completely missing from the grid (entity_id IS NULL), not orders with stale data.
Note on PR #40271: This PR (merged Dec 2025) moves the watermark from transient cache to persistent database storage via FlagManager. While it solves unnecessary full-table scans after cache flushes, it does not fix this race condition — and makes the permanent nature of the bug explicit rather than accidental (no TTL at all in the DB-backed version).
Note on commit 85baae4: This commit refactors UpdatedIdListProvider with cursor-based scanning. It improves performance for new orders missing from the grid but does not touch UpdatedAtListProvider where the race condition lives.
Timeline for understanding
Timeline:
─────────────────────────────────────────────────────────
14:30:00 Cron starts running. It queries for stale orders.
14:30:01 While the cron's query is executing, an admin
completes Order #5432. sales_order is updated
with updated_at = 14:30:01.
But the cron's query already started — it doesn't
see Order #5432.
14:30:02 Cron finishes processing OTHER orders it found.
The highest updated_at among those was 14:30:05
(from a different order). It saves the watermark
as 14:30:05.
14:31:00 Next cron run. Watermark = 14:30:05.
Query: WHERE main.updated_at > grid.updated_at
AND main.updated_at > '14:30:05'
Order #5432 has updated_at = 14:30:01.
14:30:01 > 14:30:05? NO. Filtered out. Skipped.
14:32:00 Next cron run. Same watermark. Same result. Skipped.
... every future cron run skips it too ...
Release note
When async grid indexing is enabled (dev/grid/async_indexing = 1), orders updated during a cron sync cycle could permanently show stale data in the admin Sales > Orders grid. The watermark optimization in UpdatedAtListProvider has been removed to ensure all out-of-sync orders are detected reliably.
Triage and priority
cc: @convenient
Preconditions and environment
dev/grid/async_indexingset to1(enabled)Steps to reproduce
bin/magento config:set dev/grid/async_indexing 1* * * * *schedule forsales_grid_order_async_insert)Expected result
The order grid should reflect the updated order status (e.g., "Complete") within one or two cron cycles (1-2 minutes).
Actual result
The order grid permanently shows the old status (e.g., "Processing") for the affected order. The stale data persists indefinitely and never self-corrects.
Additional information
Root cause
UpdatedAtListProvider(vendor/magento/module-sales/Model/ResourceModel/Provider/UpdatedAtListProvider.php) uses aLastUpdateTimeCachewatermark to optimize its query. The intended query is:But the watermark adds an additional filter:
The race condition:
sales_order.updated_atis set to T1updated_atvalues are > T1, advances the watermark past T1WHERE main.updated_at > :watermark— Order X (at T1) fails this check and is skippedWhy it never self-heals:
The watermark is stored in Magento's cache with a 3600-second TTL (
LastUpdateTimeCache.php:44-49). However, every cron cycle that processes at least one order callslastUpdateTimeCache->save()(Grid.php:150), which resets the TTL. On any active store, the cron processes orders every minute, so the cache is perpetually refreshed and never expires. The skipped order is permanently stuck.UpdatedIdListProvidercannot catch these orders either — it only finds orders completely missing from the grid (entity_id IS NULL), not orders with stale data.Note on PR #40271: This PR (merged Dec 2025) moves the watermark from transient cache to persistent database storage via
FlagManager. While it solves unnecessary full-table scans after cache flushes, it does not fix this race condition — and makes the permanent nature of the bug explicit rather than accidental (no TTL at all in the DB-backed version).Note on commit 85baae4: This commit refactors
UpdatedIdListProviderwith cursor-based scanning. It improves performance for new orders missing from the grid but does not touchUpdatedAtListProviderwhere the race condition lives.Timeline for understanding
Release note
When async grid indexing is enabled (
dev/grid/async_indexing = 1), orders updated during a cron sync cycle could permanently show stale data in the admin Sales > Orders grid. The watermark optimization inUpdatedAtListProviderhas been removed to ensure all out-of-sync orders are detected reliably.Triage and priority
cc: @convenient