<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>177003</bug_id>
          
          <creation_ts>2017-09-15 08:33:13 -0700</creation_ts>
          <short_desc>[Harfbuzz] Take into account brackets or quotation marks when collecting runs</short_desc>
          <delta_ts>2021-03-31 15:56:07 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Text</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=178960</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>Gtk</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Carlos Garcia Campos">cgarcia</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>bugs-noreply</cc>
    
    <cc>dr.khaled.hosny</cc>
    
    <cc>mmaxfield</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1349308</commentid>
    <comment_count>0</comment_count>
    <who name="Carlos Garcia Campos">cgarcia</who>
    <bug_when>2017-09-15 08:33:13 -0700</bug_when>
    <thetext>In determining the boundaries of a run of text in a given script, programs must resolve any of the special Script property values, such as Common, based on the context of the surrounding characters. A simple heuristic uses the script of the preceding character, which works well in many cases. However, this may not always produce optimal results. For example, in the text “... gamma (γ) is ...”, this heuristic would cause matching parentheses to be in different scripts.

Generally, paired punctuation, such as brackets or quotation marks, belongs to the enclosing or outer level of the text and should therefore match the script of the enclosing text. In addition, opening and closing elements of a pair resolve to the same Script property values, where possible. The use of quotation marks is language dependent; therefore it is not possible to tell from the character code alone whether a particular quotation mark is used as an opening or closing punctuation. For more information, see Section 6.2, General Punctuation, of [Unicode].

http://www.unicode.org/reports/tr24/#Common</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1364812</commentid>
    <comment_count>1</comment_count>
      <attachid>325003</attachid>
    <who name="Khaled Hosny">dr.khaled.hosny</who>
    <bug_when>2017-10-26 06:33:58 -0700</bug_when>
    <thetext>Created attachment 325003
Test file for brackets handling

(copying from bug 178625 comment 11)

In the attached HTML file the period should be rendered the same in both lines (you need the font from http://www.amirifont.org/), but currently the second line is different because the closing bracket takes the script of the Latin text before it and subsequently the period is rendered with the Latin script instead of Arabic.

It should be noted that both Firefox and Chrome do not seem to handle this, so it seems not to be a priority (LibreOffice does, but I wrote that code and it isn’t a web browser).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1364961</commentid>
    <comment_count>2</comment_count>
    <who name="Myles C. Maxfield">mmaxfield</who>
    <bug_when>2017-10-26 12:06:04 -0700</bug_when>
    <thetext>Is this bug about our bidi algorithm implementation or is it about something inside ComplexTextControllerHarfBuzz?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1365277</commentid>
    <comment_count>3</comment_count>
    <who name="Carlos Garcia Campos">cgarcia</who>
    <bug_when>2017-10-27 00:09:08 -0700</bug_when>
    <thetext>(In reply to Myles C. Maxfield from comment #2)
&gt; Is this bug about our bidi algorithm implementation or is it about something
&gt; inside ComplexTextControllerHarfBuzz?

I&apos;m not sure yet. Unless ComplexTextController already takes this into account when breaking runs, it&apos;s ComplexTextControllerHarfBuzz specific, but I haven&apos;t looked at it in detail yet.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1365496</commentid>
    <comment_count>4</comment_count>
    <who name="Myles C. Maxfield">mmaxfield</who>
    <bug_when>2017-10-27 14:00:40 -0700</bug_when>
    <thetext>(In reply to Carlos Garcia Campos from comment #3)
&gt; (In reply to Myles C. Maxfield from comment #2)
&gt; &gt; Is this bug about our bidi algorithm implementation or is it about something
&gt; &gt; inside ComplexTextControllerHarfBuzz?
&gt; 
&gt; I&apos;m not sure yet. Unless ComplexTextController already takes this into
&gt; account when breaking runs, it&apos;s ComplexTextControllerHarfBuzz specific, but
&gt; I haven&apos;t looked at it in detail yet.

Recent versions of Unicode (for some definition of &quot;recent&quot;) have changed how the bracket matching algorithm works in the UBA, which is something that we want to support in the near future. It may be worth sitting on this bug until we fix that, and seeing if that work fixes this problem.

Or you could update our UBA for me &lt;3&lt;3&lt;3</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1365523</commentid>
    <comment_count>5</comment_count>
    <who name="Khaled Hosny">dr.khaled.hosny</who>
    <bug_when>2017-10-27 14:42:50 -0700</bug_when>
    <thetext>Updating UBA wouldn’t make much of a difference here, since the issue is about script itemization which ideally should be independent of bidi itemization.

One option for updating UBA implementation is to switch to ICU’s, like Gecko did recently.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1365526</commentid>
    <comment_count>6</comment_count>
    <who name="Myles C. Maxfield">mmaxfield</who>
    <bug_when>2017-10-27 14:48:13 -0700</bug_when>
    <thetext>(In reply to Khaled Hosny from comment #5)
&gt; Updating UBA wouldn’t make much of a difference here, since the issue is
&gt; about script itemization which ideally should be independent of bidi
&gt; itemization.
&gt; 
&gt; One option for updating UBA implementation is to switch to ICU’s, like Gecko
&gt; did recently.

Historically, we&apos;ve found that ICU&apos;s UBA is too slow and would be a regression. However, it&apos;s probably worth revisiting this, as the perf numbers were gathered many years ago.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1745839</commentid>
    <comment_count>7</comment_count>
    <who name="Myles C. Maxfield">mmaxfield</who>
    <bug_when>2021-03-31 15:56:07 -0700</bug_when>
    <thetext>(In reply to Myles C. Maxfield from comment #6)
&gt; Historically, we&apos;ve found that ICU&apos;s UBA is too slow and would be a
&gt; regression. However, it&apos;s probably worth revisiting this, as the perf
&gt; numbers were gathered many years ago.

I started investigating this here: https://bugs.webkit.org/show_bug.cgi?id=178960</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>325003</attachid>
            <date>2017-10-26 06:33:58 -0700</date>
            <delta_ts>2017-10-26 06:33:58 -0700</delta_ts>
            <desc>Test file for brackets handling</desc>
            <filename>test-brakets-script.html</filename>
            <type>text/html</type>
            <size>338</size>
            <attacher name="Khaled Hosny">dr.khaled.hosny</attacher>
            
              <data encoding="base64">PGh0bWwgbGFuZz0iYXIiPgogIDxoZWFkPgogICAgPG1ldGEgY2hhcnNldD0idXRmLTgiLz4KICAg
IDxzdHlsZT4KICAgICAgcCB7CiAgICAgICAgZm9udDogNDBwdCBBbWlyaTsKICAgICAgICBkaXJl
Y3Rpb246IHJ0bDsKICAgICAgICB0ZXh0LWFsaWduOiByaWdodDsKICAgICAgfQogICAgPC9zdHls
ZT4KICA8L2hlYWQ+CiAgPGJvZHk+CiAgICA8cD4KICAgINmD2YTYp9mFINi52LHYqNmKINio2KfZ
hNiu2LcgKNin2YTYo9mF2YrYsdmKKS4KICAgIDwvcD4KICAgIDxwPgogICAg2YPZhNin2YUg2LnY
sdio2Yog2KjYp9mE2K7YtyAoQW1pcmkpLgogICAgPC9wPgogIDwvYm9keT4KPC9odG1sPgo=
</data>

          </attachment>
      

    </bug>

</bugzilla>