Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Unicode 16.0.0 #836

Open
rmisev opened this issue Sep 21, 2024 · 3 comments
Open

Update to Unicode 16.0.0 #836

rmisev opened this issue Sep 21, 2024 · 3 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: idna topic: parser

Comments

@rmisev
Copy link
Member

rmisev commented Sep 21, 2024

What is the issue with the URL Standard?

Version 16.0.0 (2024-08-30) of Unicode Technical Standard #46 has been released. It fixes some previously reported issues:

  1. More IDNA roundtrippability issues #760
    The Processing 4.3. step (after Punycode decode) fixes this issue:

    If the label is empty, or if the label contains only ASCII code points, record that there was an error.

  2. IdnaTestV2.json "xn--xn--a--gua.pt" test case problem #803
    The test in question is now correctly labeled in IdnaTestV2.txt:
    xn--xn--a--gua.pt; xn--a-ä.pt; [V2, V4]; xn--xn--a--gua.pt; ; ;  # xn--a-ä.pt
    

So I think it's worth upgrading to that standard:

  1. Reference the new 16.0.0 Unicode Technical Standard #46 in the Normative References section.
  2. In the WPT update the IdnaTestV2-parser.py tool and the IdnaTestV2.json test file. I have opened a PR for this: URL: Update IdnaTestV2 to UTS46 16.0.0 web-platform-tests/wpt#48301
@annevk
Copy link
Member

annevk commented Nov 25, 2024

@markusicu @macchiati could you provide some context for these changes? While we submitted a bunch of feedback (as recorded in #744) it seems there's quite a few other changes as well.

E.g., is an invalid domain name today, but with Unicode 16 would be valid?

(We are seeing this in WebKit as well now: WebKit/WebKit#37104.)

@annevk annevk added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Nov 25, 2024
@markusicu
Copy link

We considered several issues and made recommendations that the UTC approved.

For example, there were complicated and unnecessary differences in processing with UseSTD3ASCIIRules=true vs. false, and some characters were disallowed based on differences between IDNA2003 and IDNA2008, while (a) IDNA2003 has not been relevant in a long time and (b) transitional processing had been deprecated in Unicode 15.1.

For details see

@annevk
Copy link
Member

annevk commented Nov 29, 2024

@markusicu thank you, that was very helpful. We'll update things here within the next couple of months.

Edit: The URL Standard references the latest version of UTS46 now, but the test situation will take a bit longer to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: idna topic: parser
Development

No branches or pull requests

3 participants