Skip to content

Proposal to use UTF32InputInterpreter (no leadingChar) for Unicode platforms in Squeak 6.0 #18

Open
@dram

Description

@dram

Abstract

Currently, in Squeak 6.0 alpha, different language environments use different input interpreters to handle keyboard inputs, in order to adding leadingChars to input characters. For Unicode platforms, there are UTF32CNInputInterpreter, UTF32GreekInputInterpreter (with no use), UTF32JPInputInterpreter, UTF32NPInputInterpreter and UTF32RussianInputInterpreter.

But leadingChar mechanism under Unicode platforms in Squeak 6.0 alpha is quite incomplete, it introduces more problems than its benefits. This proposal is to suggest using UTF32InputInterpreter for Unicode platforms, which will not introduce leadingChars into system.

Rationale

For non-Unicode platforms, Squeak has ubiquitous text converters, input and clipboard interpreters for different languages, which will handle keyboard inputs, file system accessing, clipboard accessing, file read and writing. Those text converters will add leadingChar to every character, in order to cope with problems caused by Han Unification.

For Unicode platforms, only input interpreters have completely implemented for different languages, there are no language variants for UTF8TextConverter, and variants of UTF8ClipboardInterpreter have not implement leadingChar mechanism.

This makes Squeak 6.0 alpha under a quite frustrated situation, i.e. texts from keyboards, clipboards, files and network are inconsistent regarding to leadingChar, and will be non-equal for comparing. The problems caused by this inconsistency are quite hidden and difficult to detect.

Implementation

The implementation is relatively simple, a patch (Multilingual-xw.285) have been submitted to The Inbox.

Future Works

Regarding to the future of leadingChar support for Unicode platforms in Squeak, I can think out of following possibilities:

  1. Add ubiquitous leadingChar support for Unicode platforms for all language environments.
  2. Remove leadingChar from squeak gradually, for both Unicode and non-Unicode platforms.
  3. Retain leadingChar as a mechanism to tagging texts, but require users to specify it explicitly (e.g. with help of UI).

But this is out scope of this proposal, and should be discussed separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    debt[WHAT] The issue represents technical or design debt, pointing to code smells etc.system integration[SCOPE] Squeak's integration into external infrastructure: os, databases, internet, encodings, ...

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions