<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>17689</bug_id>
          
          <creation_ts>2008-03-05 15:57:23 -0800</creation_ts>
          <short_desc>Reject long UTF sequences</short_desc>
          <delta_ts>2023-04-01 00:23:09 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>WebKit Misc.</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows XP</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>0</everconfirmed>
          <reporter name="jasneet">jasneet</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>annevk</cc>
    
    <cc>ap</cc>
    
    <cc>jasneet</cc>
    
    <cc>sam</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>72856</commentid>
    <comment_count>0</comment_count>
    <who name="jasneet">jasneet</who>
    <bug_when>2008-03-05 15:57:23 -0800</bug_when>
    <thetext>Webkit issue:
UTF standards require parsers to reject sequences that were encoded using more bytes than absolutely necessary (for example, standard 7-bit characters encoded as 2 or 4-byte strings, e.g. &amp;#0000106, either as a binary value or a HTML entity).

Modify the renderer to reject such characters, as they have no legitimate use, but are routinely abused to carry out cross-site scripting attacks (attempts to close HTML tags and inject code, when obfuscated this way, routinely bypass filters).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>72875</commentid>
    <comment_count>1</comment_count>
      <attachid>19565</attachid>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2008-03-05 22:58:38 -0800</bug_when>
    <thetext>Created attachment 19565
test case (works as expected)

Yes, our decoder does reject non-shortest UTF forms in all cases I&apos;m aware of. Do you have a specific example of the problem?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75012</commentid>
    <comment_count>2</comment_count>
      <attachid>20014</attachid>
    <who name="jasneet">jasneet</who>
    <bug_when>2008-03-24 15:07:57 -0700</bug_when>
    <thetext>Created attachment 20014
reduction</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75013</commentid>
    <comment_count>3</comment_count>
    <who name="jasneet">jasneet</who>
    <bug_when>2008-03-24 15:08:30 -0700</bug_when>
    <thetext>Looks like the only remaining worrisome case is multibyte HTML entities. These could be used to bypass filters that differentiate between absolute and relative URLs, and apply restrictions based on this distinction:

&lt;a href=&quot;javascript&amp;#x0000003aalert(1)&quot;&gt;Long HTML entity notation might be used to bypass some URL filters&lt;/a&gt;

This is not strictly a browser bug, but it has no legitimate uses, and is a common XSS vector against applications, so locking it down is certainly beneficial.

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75082</commentid>
    <comment_count>4</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2008-03-25 00:26:08 -0700</bug_when>
    <thetext>In this example, the entity is not only long, but it is not terminated with a semicolon. As such, it is covered by bug 4948.

I am not aware of any reason to reject &quot;&amp;#x0000003a;&quot;, though - other browsers handle this just fine, and standards do not disallow it AFAIK.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1945923</commentid>
    <comment_count>5</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2023-04-01 00:23:09 -0700</bug_when>
    <thetext>Indeed, this behavior is covered by the HTML Standard.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>19565</attachid>
            <date>2008-03-05 22:58:38 -0800</date>
            <delta_ts>2008-03-05 22:58:38 -0800</delta_ts>
            <desc>test case (works as expected)</desc>
            <filename>non-shortest.html</filename>
            <type>text/html</type>
            <size>87</size>
            <attacher name="Alexey Proskuryakov">ap</attacher>
            
              <data encoding="base64">PG1ldGEgY2hhcnNldD0idXRmLTgiPjxwPlNob3VsZCBoYXZlIGEgcmVwbGFjZW1lbnQgY2hhcmFj
dGVyIGluIHRoZSBtaWRkbGU6ICIvwK4uLyI8L3A+
</data>

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>20014</attachid>
            <date>2008-03-24 15:07:57 -0700</date>
            <delta_ts>2008-03-24 15:07:57 -0700</delta_ts>
            <desc>reduction</desc>
            <filename>test.htm</filename>
            <type>text/html</type>
            <size>140</size>
            <attacher name="jasneet">jasneet</attacher>
            
              <data encoding="base64">PGh0bWw+PGJvZHk+DQo8YSBocmVmPSJqYXZhc2NyaXB0JiN4MDAwMDAwM2FhbGVydCgxKSI+TG9u
ZyBIVE1MIGVudGl0eSBub3RhdGlvbiBtaWdodCBiZSB1c2VkIHRvIGJ5cGFzcyBzb21lIFVSTCBm
aWx0ZXJzPC9hPg0KPC9ib2R5PjwvaHRtbD4=
</data>

          </attachment>
      

    </bug>

</bugzilla>