Skip to content

[Php-wasm] Add intl support #2173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from
Open

Conversation

oskardydo
Copy link

Motivation for the change, related issues

Hello, i'm one of the people working on playground implementation for TYPO3. We needed to have PHP with intl available to even start the system. This should be all the code and files responsible for compilation of php with intl.

It is not 100% working, there are issues for example some functions were missing or recompile:php:node:* tasks were failing. But at least php thinks it has intl and maybe you will be able to somehow make use of it in wordpress-playground.

gettext inclusion in the files is only for convenience sake because downloading this package each time was quite lengthy process (maybe this was temporary problem)

Implementation details

Testing Instructions (or ideally a Blueprint)

@@ -1,7 +1,7 @@
# Originally forked from https://github.com/seanmorris/php-wasm
# ubuntu:lunar supports amd64 and arm64 (Apple Silicon) while
# emscripten/emsdk:3.1.24 supports amd64 only.
FROM ubuntu:lunar as emscripten
FROM ubuntu:noble as emscripten
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why do we need a Ubuntu version update?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lunar is EOL (since January 2024) so i wasn't able to just run the dockerfile and install everything, changing ubuntu version helped in that regard.

@adamziel
Copy link
Collaborator

adamziel commented Apr 1, 2025

@zaerl @bgrgicak would you be able to help with reviews and guidance here?

@tmotyl
Copy link

tmotyl commented Apr 1, 2025

FYI, here is how original php-wasm builds intl https://github.com/seanmorris/php-wasm/tree/master/packages/intl maybe its an useful context

@mho22
Copy link
Contributor

mho22 commented Apr 6, 2025

@oskardydo I tried to enable libintl on a local branch with ubuntu:lunar to understand and target the issues you have. I first tried to enable-intl for php compilation :

php/Dockerfile

ARG WITH_INTL

# Add intl if needed
RUN if [ "$WITH_INTL" = "yes" ]; \
	then echo -n ' --enable-intl ' >> /root/.php-configure-flags; \
	fi;

I also modified the build.js file in order for WITH_INTL to be taken into account.

node node_modules/.bin/nx reset && npm run recompile:php:web:jspi:8.3

and got the following errors :

#66 165.0 wasm-ld: error: ext/intl/intl_convertcpp.o: relocation R_WASM_MEMORY_ADDR_LEB cannot be used against symbol zend_empty_string; recompile with -fPIC
#66 165.0 wasm-ld: error: ext/intl/common/common_enum.o: relocation R_WASM_MEMORY_ADDR_LEB cannot be used against symbol IntlIterator_ce_ptr; recompile with -fPIC
...

To move forward, I added :

ICU_CFLAGS="-fPIC -I/root/lib/include" \
ICU_LIBS="-L/root/lib/lib" \

inside RUN ... emconfigure ./configure on line 306. It made the process go to the next error :

#65 71.92 In file included from /root/php-src/ext/intl/php_intl.c:23:
#65 71.92 /root/php-src/ext/intl/php_intl.h:29:10: fatal error: 'unicode/ubrk.h' file not found
#65 71.92    29 | #include <unicode/ubrk.h>
#65 71.92       |          ^~~~~~~~~~~~~~~~
#65 71.94 1 error generated.

I didn't copy the extension files because I hadn't compiled the package yet :

I copy pasted this pull request libintl/Dockerfile, compile/Makefile and also added wget gettext :

RUN set -eux; \
    cd /tmp && \
    wget https://ftp.gnu.org/gnu/gettext/gettext-0.21.1.tar.gz && \
    tar -xvf gettext-0.21.1.tar.gz && \
    rm gettext-0.21.1.tar.gz

then

rm -rf libintl/jspi/dist && make libintl_jspi

I then added COPY ./libintl/ /root/builds/libintl on line 38 in php/Dockerfile.

php/Dockerfile

...
COPY ./libintl/ /root/builds/libintl

I ran : node node_modules/.bin/nx reset && npm run recompile:php:web:jspi:8.3

I now have these errors in the 67th step :

#80 1.435 wasm-ld: error: /root/lib/libphp.a(timezone_class.o): undefined symbol: vtable for icu_74::UnicodeString
#80 1.435 wasm-ld: error: /root/lib/libphp.a(msgformat_helpers.o): undefined symbol: typeinfo for icu_74::DateFormat
#80 1.435 wasm-ld: error: /root/lib/libphp.a(msgformat_helpers.o): undefined symbol: typeinfo for icu_74::Format
#80 1.435 wasm-ld: error: /root/lib/libphp.a(msgformat_helpers.o): undefined symbol: vtable for icu_74::FieldPosition

Probably related to the missing(?) -licuuc -licui18n library. Are these the errors you encounter on your side? Am I missing something?

Have you successfully compiled PHP for the web?

@tmotyl I also made a minimal version of libintl/Dockerfile inspired by the original seanmorris/php-wasm intl extension build in

seanmorris/php-wasm/packages/intl/static.mak .

In the static.mak file, it seems that the author removes the current bin directory and replaces it with the native bin before chmod every file inside.

Even if I followed every steps in my Dockerfile, it is failing with :

../bin/icupkg : Permission denied : Emscripten can't run icupkg since it has been built for linux.

I set that problem aside since the previous errors should be solved first.

@mho22
Copy link
Contributor

mho22 commented Apr 7, 2025

I found out what was wrong in my code.

The undefined symbol: vtable for icu_74::UnicodeString errors were related to libraries not loaded during linking step.

So I added this in php/Dockerfile :

# Add intl if needed
RUN if [ "$WITH_INTL" = "yes" ]; then \
		echo -n ' --enable-intl ' >> /root/.php-configure-flags; \
+		echo -n ' -licuuc -licudata -licui18n -licuio ' >> /root/.emcc-php-wasm-flags; \
	fi;

And added these libraries in EMCC_SKIP on line 429 :

EMCC_SKIP="-lz -ledit -ldl -lncurses -lzip -lpng16 -lssl -licuuc -licudata -licui18n -licuio -lcrypto -lxml2 -lc -lm -lsqlite3 /root/lib/lib/libxml2.a /root/lib/lib/libsqlite3.so /root/lib/lib/libsqlite3.a /root/lib/lib/libsqlite3.a" \

I also used a minimal libintl/Dockerfile version to decrease potential fails :

libintl/Dockerfile

FROM playground-php-wasm:base


ARG JSPI


RUN set -euxo pipefail && \
    wget https://github.com/unicode-org/icu/releases/download/release-74-2/icu4c-74_2-src.tgz && \
    tar -xvf icu4c-74_2-src.tgz && \
    rm icu4c-74_2-src.tgz


RUN set -euxo pipefail && \
    mkdir -p /root/native && \
    cd /root/native && \
    /root/icu/source/configure && \
    make clean && \
    make -j"$(nproc)"


RUN set -euxo pipefail && \
    cd /root/icu/source && \
    source /root/emsdk/emsdk_env.sh && \
    mkdir -p /root/lib && \
    emconfigure ./configure \
        --prefix=/root/lib \
        --with-cross-build=/root/native \
        --with-data-packaging=files \
        --disable-extras \
        --enable-static && \
    emmake make clean && \
    export JSPI_FLAGS=$(if [ "$JSPI" = "1" ]; then echo "-sSUPPORT_LONGJMP=wasm -fwasm-exceptions"; else echo ""; fi) && \
    EMCC_FLAGS="$JSPI_FLAGS" emmake make -j"$(nproc)" && \
    mkdir -p /root/lib/share/icu/74.2/icudt74l/{brkitr,coll,curr,lang,rbnf,region,translit,unit,zone}

RUN cd /root/icu/source && \
    source /root/emsdk/emsdk_env.sh && \
    emmake make install

RUN ls -a /root/lib

node node_modules/.bin/nx reset && npm run recompile:php:web:jspi:8.3

PHP compilation succeeded and :

Capture d’écran 2025-04-07 à 12 55 45

Next steps :

  • Compile web and node
  • Verify if each library is necessary
  • Clean code
  • Add gettext ?

@mho22
Copy link
Contributor

mho22 commented Apr 8, 2025

I compiled php in node and web mode with 3 version failures :


  • version 7.3 returned this :
76.34 In file included from /root/php-src/ext/intl/breakiterator/breakiterator_class.cpp:23:
76.34 /root/php-src/ext/intl/breakiterator/codepointiterator_internal.h:42:17: error: virtual function 'operator==' has a different return type ('UBool' (aka 'signed char')) than the function it overrides (which has return type 'bool')
76.34    42 |                 virtual UBool operator==(const BreakIterator& that) const;
76.34       |                         ~~~~~ ^
76.34 /root/lib/include/unicode/brkiter.h:127:18: note: overridden virtual function is here
76.34   127 |     virtual bool operator==(const BreakIterator&) const = 0;
76.34       |             ~~~~ ^
76.41 1 error generated.
  • version 7.2 and 7.1 returned this :
64.09 /root/php-src/ext/intl/collator/collator_sort.c:349:26: error: use of undeclared identifier 'TRUE'
64.09   349 |         collator_sort_internal( TRUE, INTERNAL_FUNCTION_PARAM_PASSTHRU );
64.09       |                                 ^
64.09 /root/php-src/ext/intl/collator/collator_sort.c:543:26: error: use of undeclared identifier 'FALSE'
64.09   543 |         collator_sort_internal( FALSE, INTERNAL_FUNCTION_PARAM_PASSTHRU );
64.09       |                                 ^
64.09 2 errors generated.
  • version 7.0 returned this :
49.71 checking for icu-config... no
49.71 checking for location of ICU headers and libraries... not found
49.71 configure: error: Unable to detect ICU prefix or no failed. Please verify ICU install prefix and make sure icu-config works.

Modifying php/Dockerfile did disable those versions and successfully compiled them (obviously) :

# Add intl if needed
RUN if [ "$WITH_INTL" = "yes" ]; \
	then \
		if [ "${PHP_VERSION:0:1}" -eq "8" ] || [[ "${PHP_VERSION:0:1}" -eq "7" && "${PHP_VERSION:2:1}" -eq "4" ]]; then \
			echo -n ' --enable-intl ' >> /root/.php-configure-flags; \
			echo -n ' -licuuc -licudata -licui18n -licuio ' >> /root/.emcc-php-wasm-flags; \
		fi; \
	fi;


I checked if each -licuuc -licudata -licui18n -licuio libraries were necessary for php to compile, and, yes they are.



I cleaned a little bit more the code and found out that this part :

RUN cat >> /root/php-src/main/php_config.h <<EOF
#ifdef __cplusplus
extern "C" {
#endif

extern int wasm_shutdown(int fd, int how);
extern int wasm_close(int fd);

#ifdef __cplusplus
}
#endif
EOF

Is only needed when WITH_INTL is enabled. Else :

RUN echo 'extern int wasm_shutdown(int fd, int how);' >> /root/php-src/main/php_config.h;
RUN echo 'extern int wasm_close(int fd);' >> /root/php-src/main/php_config.h;

While this :

RUN sed -i '/^extern char \*\*environ;/d' /root/php-src/main/php.h \
 && cat <<EOF >> /root/php-src/main/php.h
#ifdef __cplusplus
extern "C" {
#endif

extern char **environ;

#ifdef __cplusplus
}
#endif
EOF

Is only needed when WITH_INTL AND version below 7.4.



@adamziel @bgrgicak At this point, should :

  1. intl be enabled for versions below 7.4
  2. intlbe compiled in node mode only since intl is quite heavy ? jspi/8_4_0/php_8_4.wasm seems to be 2432721 bytes heavier
  3. these portions of code in files /root/php-src/main/php_config.h; and /root/php-src/main/php.h be better handled based on WITH_INTL
  4. a new PR be more suitable for gettext extension itself ?


Next step :

Load icudt74l.dat ICU data file at php runtime since file is 30Mo. Not sure if this is the best approach or even if it is technically possible.

@adamziel
Copy link
Collaborator

adamziel commented Apr 9, 2025

intl be enabled for versions below 7.4

Yes. I'd be more than happy to see WordPress drop BC for PHP 7.2 and 7.3, but it supports them today.

intlbe compiled in node mode only since intl is quite heavy ? jspi/8_4_0/php_8_4.wasm seems to be 2432721 bytes heavier

Interesting! Any ideas why that's the case? Also, how many additional megabytes are we talking about overall? If more than 3 or 4 then yes, let's restrict that to Node.js for the purposes of this PR and figure out how to build and load that as a dynamic extension later on.

these portions of code in files /root/php-src/main/php_config.h; and /root/php-src/main/php.h be better handled based on WITH_INTL

If there are no unintended side-effects, it should be fine to just always ship them. Ideally with a clarifying comment such as "needed in PHP 7.2 and 7.3 to ...".

a new PR be more suitable for gettext extension itself ?

Any PR structure that's convenient for you works.

BTW, it seems like VMWare Labs got a WASM build of ICU – linking here in case that's any useful:

https://github.com/vmware-labs/webassembly-language-runtimes/tree/300c157844ff30799232528970fdd554f9d6a495/libs/icu

@adamziel
Copy link
Collaborator

adamziel commented Apr 9, 2025

I just found this dependency table for php extensions, might be useful here and for other extensions: https://static-php.dev/en/guide/deps-map.html

@mho22
Copy link
Contributor

mho22 commented Apr 11, 2025

After many failed compilations attempts and experiments, a wild ICU TZData version finally appeared!

web :

Capture d’écran 2025-04-11 à 09 45 41

node :

intl.php

<?php

$formatter = new \NumberFormatter('en-US', \NumberFormatter::CURRENCY);
var_dump($formatter->format(100.00));

$formatter = new \NumberFormatter('ja-JP', \NumberFormatter::CURRENCY);
var_dump($formatter->format(100.00));
> node scripts/node.js intl.php
string(7) "$100.00"
string(6) "¥100"


The solution was to preload the icudt74l.dat data file during PHP build, load it in filesystem for node or import it for web.

Next, I need to clean up the entire process to ensure each step is truly necessary.

@mho22
Copy link
Contributor

mho22 commented Apr 11, 2025

@adamziel Sorry I forgot to answer your question

Interesting! Any ideas why that's the case? Also, how many additional megabytes are we talking about overall? If more than 3 or 4 then yes, let's restrict that to Node.js for the purposes of this PR and figure out how to build and load that as a dynamic extension later on.

The libintl/root/lib/lib directory includes :

  • libicudata.a [ 526 bytes ]
  • libicui18n.a [ 4.3 Mb ]
  • libicuio.a [ 52 kb ]
  • libicutest.a [ 591 kb ]
  • libicutu.a [ 452 kb ]
  • libicuuc.a [ 2.6 Mb ]

While libintl/root/lib/include directory includes :

  • unicode [ 4,8 Mb with 197 .h files ]


Compiling PHP 8.4 Web JSPI returned the following results:

Working: 18,576,604 bytes
HEAD: 16,143,865 bytes
Diff: 2,432,739 bytes



But I found out there is a section in ICU documentation dedicated to making ICU smaller.

I’m not exactly sure what should or shouldn’t be enabled right now, but there’s definitely room for optimization.



I should also mention that the data file named icudt74l.dat weigh 30.8 Mb. As it is added with --preload-file in php/Dockerfile ;

emcc $OPTIMIZATION_FLAGS \
...
--preload-file /root/lib/share/icu/74.2@/internal/shared/preload \

the resulting file named php.data is also 30.8Mb.

@adamziel
Copy link
Collaborator

I should also mention that the data file named icudt74l.dat weigh 30.8 Mb. As it is added with --preload-file in php/Dockerfile ;

Does the browser need to download it? Or is it just needed to build/link PHP?

@mho22
Copy link
Contributor

mho22 commented Apr 11, 2025

Does the browser need to download it? Or is it just needed to build/link PHP?

The browser needs to download the php.data file to load all the necessary resources into memory and ensure the application works correctly. But this one can also be minified based on ICU documentation.

Capture d’écran 2025-04-11 à 15 15 01

@tmotyl
Copy link

tmotyl commented Apr 11, 2025

is the icu file binary, or it benefits from gzip compression over wire?

@mho22
Copy link
Contributor

mho22 commented Apr 12, 2025

is the icu file binary, or it benefits from gzip compression over wire?

It is a binary file. Emscripten preloads the ICU data file and packages it into php.data, which works together with php.js at runtime.

@mho22
Copy link
Contributor

mho22 commented Apr 15, 2025

I managed to make version 7.3 7.2 and 7.1 compatible with intl version 74.2.

I made those dirty replacements as a work in progress but the real operations will be inserted in the different /root/php${PHP_VERSION:0:3}*.patch or using root/replace.sh correctly.

7.3 :

if [[ "${PHP_VERSION:0:1}" -eq "7" && "${PHP_VERSION:2:1}" -le "3" ]]; then \

        sed -i 's/UBool operator/bool operator/' /root/php-src/ext/intl/breakiterator/codepointiterator_internal.h; \
        sed -i 's/UBool CodePointBreakIterator::operator/bool CodePointBreakIterator::operator/' /root/php- 
        src/ext/intl/breakiterator/codepointiterator_internal.cpp; \

        sed -i 's/ getArg/ phpGetArg/g' /root/php-src/ext/intl/msgformat/msgformat_helpers.cpp; \
        sed -i 's/::getArg/::phpGetArg/g' /root/php-src/ext/intl/msgformat/msgformat_helpers.cpp; \
        sed -i 's/ getMessage/ phpGetMessage/g' /root/php-src/ext/intl/msgformat/msgformat_helpers.cpp; \
        sed -i 's/::getMessage/::phpGetMessage/g' /root/php-src/ext/intl/msgformat/msgformat_helpers.cpp; \

fi; \

7.2:

if [[ "${PHP_VERSION:0:1}" -eq "7" && "${PHP_VERSION:2:1}" -le "2" ]]; then \

        sed -i 's/collator_sort_internal( TRUE, INTERNAL_FUNCTION_PARAM_PASSTHRU );/collator_sort_internal( true, INTERNAL_FUNCTION_PARAM_PASSTHRU );/' /root/php-src/ext/intl/collator/collator_sort.c; \
        sed -i 's/collator_sort_internal( FALSE, INTERNAL_FUNCTION_PARAM_PASSTHRU );/collator_sort_internal( false, INTERNAL_FUNCTION_PARAM_PASSTHRU );/' /root/php-src/ext/intl/collator/collator_sort.c; \
        sed -i 's/uret = FALSE;/uret = false;/' /root/php-src/ext/intl/normalizer/normalizer_normalize.c; \
        sed -i 's/is_pattern_localized =FALSE;/is_pattern_localized = false;/' /root/php-src/ext/intl/dateformat/dateformat_attr.c; \
        sed -i 's/isLenient  = FALSE;/isLenient  = false;/' /root/php-src/ext/intl/dateformat/dateformat_attr.c; \
        sed -i 's/tz->getOffset(now, FALSE, rawOffset, dstOffset, uec);/tz->getOffset(now, false, rawOffset, dstOffset, uec);/' /root/php-src/ext/intl/timezone/timezone_class.cpp; \
        sed -i 's/timezone_convert_datetimezone(tzobj->type, tzobj, FALSE, NULL,/timezone_convert_datetimezone(tzobj->type, tzobj, false, NULL,/' /root/php-src/ext/intl/timezone/timezone_methods.cpp; \
        sed -i 's/ FALSE/ false/g' /root/php-src/ext/intl/breakiterator/codepointiterator_internal.cpp; \
        sed -i 's/ TRUE/ true/g' /root/php-src/ext/intl/breakiterator/codepointiterator_internal.cpp; \

fi; \

7.1:

# PHP <= 7.1 is not very good at detecting the presence of the POSIX zend_sprintf function
# so we need to force it to be enabled.
RUN if [[ "${PHP_VERSION:0:1}" -le "7" && "${PHP_VERSION:2:1}" -le "1" ]]; then \
		/root/replace.sh 's/define ZEND_BROKEN_SPRINTF 1/define ZEND_BROKEN_SPRINTF 0/g' /root/php-src/main/php_config.h; \
	fi;

Version 7.0 is currently unsupported due to a massive set of patches needed [ To my understanding, version 7.0 used the same namespace as an old version of ICU for several classes and they were intimately linked ].

But, even if this hasn't been said in this topic, I think versions 7.1 and 7.0 are no more supported and therefore no more needed, right ?



However, since intl is running on web for version from 8.4 to 7.1 the next steps are :

  1. Patches or Replace.sh
  2. node
  3. New PR
  4. Make libicuuc and libui18n smaller based on ICU documentation
  5. Make icudt74l.dat smaller based on ICU documentation

Edit :

I struggle building node versions below 7.3 due to --disable-cli no more being present [ because of WITH_CLI_SAPI = 'yes' ]. What I used with ICU_CFLAGS="-fPIC" doesn't work since this variable is not present in versions below 7.3

Edit 2 :

Nailed it.

@adamziel
Copy link
Collaborator

But, even if this hasn't been said in this topic, I think versions 7.1 and 7.0 are no more supported and therefore no more needed, right ?

Yes, it's just 7.2+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

6 participants