Skip to content

Report viewed products does not seem to work correctly #38337

Open
@dandrikop

Description

@dandrikop

Preconditions and environment

  • Magento version 2.4.6-p1

Steps to reproduce

  1. At the section of the file "vendor/magento/module-customer/etc/di.xml" add your user agent at the section below:
    <type name="Magento\Customer\Model\Visitor"> <arguments> <argument name="ignoredUserAgents" xsi:type="array"> <item name="google1" xsi:type="string">Googlebot/1.0 (googlebot@googlebot.com http://googlebot.com/)</item> <item name="google2" xsi:type="string">Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</item> <item name="google3" xsi:type="string">Googlebot/2.1 (+http://www.googlebot.com/bot.html)</item> </argument> </arguments> </type>

  2. Recompile Magento.

  3. Generate the static files.

  4. Clear the caches.

  5. Visit a product page with your user agent.

Expected result

Since your user agent is configured to be ignored in the file "vendor/magento/module-customer/etc/di.xml", no record should be created in the table "report_viewed_product_index" for your visiting any product page of the store.

Actual result

A record is created in the table "report_viewed_product_index" for your visiting any product page of the store.

Additional information

  1. The user agents to be ignored should be matched as complete strings in the file "vendor/magento/module-customer/etc/di.xml". By default, the di.xml file contains 3 user agents associated with GoogleBot. But, the GoogleBot user agent contains the string "Chrome/W.X.Y.Z" which changes from time to time based on the version of the Chrome browser used by that very user agent; e.g.: "Chrome/41.0.2272.96". So, the log files of Apache or Nginx should be regularly checked in order to get new user agents associated with GoogleBot. The same happens with BingBot, and maybe with other bots. A relative match would be better, for example matching any user agent including the string "Googlebot/2.1".
  2. It is not clear how the visitor_id and customer_id fields of the table "report_viewed_product_index" are updated. When a logged-in customer views a product page, then the customer_id field gets a value, while the visitor_id field is NULL. When the customer logs out and view another product page, then the visitor_id field gets a value while the customer_id field is NULL. However, when the customer has never logged in while browsing the store, then both the visitor_id and customer_id fields get the NULL value.

Due to the above problems, the table "report_viewed_product_index":

  • includes data coming from both real visitors and bots, while it is not possible to discriminate the data coming from bots, so as to at least truncate the relevant records. As a result the statistics are polluted by bots.
  • can become really huge within a short time depending o.n the number of products.
  • generates slow queries in the database when its size becomes large, as it is used in queries via INNER JOINs. A slow query causes a general performance issue on the database, as the involved tables stay open more time waiting for the query to get executed.
  • is updated for every visit of product pages generating unnecessary work load on the database given that a bot can crawl thousands of pages within a day.

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: ProductComponent: DataIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedPriority: P2A defect with this priority could have functionality issues which are not to expectations.Progress: ready for devReported on 2.4.6-p1Indicates original Magento version for the Issue report.Reproduced on 2.4.xThe issue has been reproduced on latest 2.4-develop branch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions