Skip to content

Product Import - Category data, including min position, is loaded; even if COL_CATEGORY is not set (Performance / Extra Work)  #39422

Open
@steven-hoffman-jomashop

Description

@steven-hoffman-jomashop

Preconditions and environment

  • See 2.4.8-beta1
    • (Should be present from 2.4.4)
  • In order for the performance impact to be measurable, the system would need to have many products in various categories
    • Note: that in any case dev:query-log:enable will show the extra queries

Steps to reproduce

  1. Create a csv file with two (2) columns
    • sku and any other column should be enough
    • 1-2 rows should be enough
  2. Run the product import logic with append option selected
    • (Both the API and admin panel import should exhibit this behavior)

Expected result

Extra logic related to loading / saving category data will not be called

Actual result

  1. Logic related to finding the min position is called
  2. Logic related to insert, on duplicate key update, is called for the existing category product relations
  3. Logic related to loading the category collection for the entire site is called

Additional information

If the above three sets of calls; when the product to category table (catalog_category_product) is large loading the min position is the most expensive.
This is because their is no index on position along with category_id and the query needs to scan all entries for that category.
SELECT MIN(position) AS `position` FROM `catalog_category_product` WHERE (category_id = ?)

The insert of existing data is un-needed, but seems to cost less.

Logic related to loading the category collection for the entire site is called once in bulk; it costs a flat amount per import depending on the number of categories in the site.


Code related notes

5da1c357bfd9 appears to introduce most of the issue for 1 and 2 above.
It changes processRowCategories to load the existing category product data from the product.
The change likely was intended make the data available for AfterImportDataObserver as generating the new urls appears to require the existing categories. (It seems, that the data made available to AfterImportDataObserver will differ if a COL_CATEGORY is set, as if it is, the product's existing category data will not be loaded).

The issue is that Product::saveProductCategoriesPhase adds it directly into the categoriesCache. Without differentiating between rowData and existing/loaded product data.
(And Product::_saveProductCategories is passed categoriesCache with both rowData and existing product category associations and calling getProductCategoriesDataSave) which calls the min position query and the insert, even if COL_CATEGORY is not present in the import's columns)

For 3 above, CategoryProcessor calls initCategories during __construct.


Possible fixes:

  1. saveProductCategoriesPhase and processRowCategories can be adjusted, where processRowCategories returns some metadata to differentiate rowData from product loaded data. Then saveProductCategoriesPhase can add to categoriesCache with the value false. Then _saveProductCategories‎ can be called with only the 'true' values.
  2. Alternatively, processRowCategories can be adjusted to not load the product's data, and getProductCategories can be adjusted to load the product data if rowData is missing. (processRowCategories could also load the product data into a sperate property and getProductCategories can use which ever property has data available).
    • (Also, AfterImportDataObserver has it's own checks on rowData and in many cases loading the product to get the existing category data will not be needed).

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: CatalogComponent: CatalogIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedPriority: P3May be fixed according to the position in the backlog.Progress: ready for devReported on 2.4.8-beta1Indicates original Magento version for the Issue report.Reproduced on 2.4.xThe issue has been reproduced on latest 2.4-develop branchTriage: Dev.ExperienceIssue related to Developer Experience and needs help with Triage to Confirm or Reject it

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions