<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>3556</bug_id>
          
          <creation_ts>2005-06-15 21:03:09 -0700</creation_ts>
          <short_desc>black diamond question mark shown for invalid UTF-8 sequences</short_desc>
          <delta_ts>2019-02-06 09:04:03 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>DOM</component>
          <version>412</version>
          <rep_platform>Mac</rep_platform>
          <op_sys>OS X 10.4</op_sys>
          <bug_status>VERIFIED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://www.cheap-hotel-rooms.com/Reno/Peppermill-Hotel.htm</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Darin Adler">darin</reporter>
          <assigned_to name="Darin Adler">darin</assigned_to>
          <cc>ap</cc>
    
    <cc>cdumez</cc>
    
    <cc>nickshanks</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>12188</commentid>
    <comment_count>0</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-06-15 21:03:09 -0700</bug_when>
    <thetext>The link above is one site that has invalid UTF-8 sequences. There are many others. Also seen on 
news.google.com.

Other browsers just seem to ignore these sequences. So we should too.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12189</commentid>
    <comment_count>1</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-06-15 21:05:46 -0700</bug_when>
    <thetext>The bad sequences are partway down the page, where it says &quot;including a 120-screen cube&quot;. I imagine 
they are em dashes, probably in Windows Latin-1 encoding.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12190</commentid>
    <comment_count>2</comment_count>
      <attachid>2379</attachid>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-06-15 21:07:13 -0700</bug_when>
    <thetext>Created attachment 2379
Patch to ignore U+FFFD characters coming out of the decoder</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12237</commentid>
    <comment_count>3</comment_count>
    <who name="Nicholas Shanks">nickshanks</who>
    <bug_when>2005-06-16 07:10:07 -0700</bug_when>
    <thetext>I see these everywhere. Just hiding them is not really optimal though:

1) Go to safari preferences
2) Set default encoding to UTF-8
3) Browse the internet for a bit

You will see that many sites aren&apos;t sending encoding information, Safari is ignoring the Content-
Encoding HTTP header override &lt;meta&gt; tag, or it&apos;s ignoring the XML charset information for xhtml 
served as text/html, (or all of the above, I can&apos;t really tell). Whatever the cause, it would make websites 
harder to read if the user was not aware that a character was missing/mis-encoded. Words would just 
appear with letters missing, and their meanings might change!

One solution I can think of would be to note all the invalid characters encountered and try to match up 
a likely encoding based on document language perhaps, then suggest a document re-interpretation to 
the user.
This is something that should be reported as an error when in web developer mode too.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12250</commentid>
    <comment_count>4</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-06-16 10:07:27 -0700</bug_when>
    <thetext>Yes, automatically determining the correct encoding for web pages would be pretty neat.

But that&apos;s not what this bug is about. This bug is about matching other browsers&apos; behavior on various 
sites. All the other browsers, and older versions of Safari, simply ignore those bytes. We stopped ignoring 
them and started putting in black diamond question marks because of a change in the underlying OS.

Please file a new bug report with specific suggestions about your enhancement idea. I don&apos;t think that idea 
and the concept that &quot;skipping these characters is not good enough&quot; should prevent us from fixing this 
regression and once-again matching the behavior of other browsers. Lets not continue that discussion 
here unless there&apos;s a really good reason to do so.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12252</commentid>
    <comment_count>5</comment_count>
      <attachid>2379</attachid>
    <who name="John Sullivan">sullivan</who>
    <bug_when>2005-06-16 10:49:46 -0700</bug_when>
    <thetext>Comment on attachment 2379
Patch to ignore U+FFFD characters coming out of the decoder

r=me, excellent comment</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12255</commentid>
    <comment_count>6</comment_count>
    <who name="Nicholas Shanks">nickshanks</who>
    <bug_when>2005-06-16 11:39:19 -0700</bug_when>
    <thetext>(In reply to comment #4)
&gt; I don&apos;t think that idea and the concept that &quot;skipping these characters is not good enough&quot;
&gt; should prevent us from fixing this regression and once-again matching the behavior of
&gt; other browsers.

Oh, I agree. I was just saying it was not optimal, and that further work could be done to improve the 
situation. Was definitely not suggestion that the patch shouldn&apos;t be applied! Apologies if I gave that 
impression.
I shall open a bug about automatic encoding detection.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>13803</commentid>
    <comment_count>7</comment_count>
    <who name="Joost de Valk (AlthA)">joost</who>
    <bug_when>2005-07-03 08:10:28 -0700</bug_when>
    <thetext>Darin, please mark this as verified if you think it is ;).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>15956</commentid>
    <comment_count>8</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-08-04 18:16:10 -0700</bug_when>
    <thetext>In Radar as &lt;rdar://problem/4206050&gt; 8A345: Bad (question mark in black diamond) characters in 
news.google.com</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>46278</commentid>
    <comment_count>9</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2006-06-19 09:11:42 -0700</bug_when>
    <thetext>This change was reverted in bug 8972.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1503090</commentid>
    <comment_count>10</comment_count>
    <who name="Lucas Forschler">lforschler</who>
    <bug_when>2019-02-06 09:04:03 -0800</bug_when>
    <thetext>Mass moving XML DOM bugs to the &quot;DOM&quot; Component.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>2379</attachid>
            <date>2005-06-15 21:07:13 -0700</date>
            <delta_ts>2005-06-16 10:49:46 -0700</delta_ts>
            <desc>Patch to ignore U+FFFD characters coming out of the decoder</desc>
            <filename>BlackDiamondQuestionMarkPatch.txt</filename>
            <type>text/plain</type>
            <size>3148</size>
            <attacher name="Darin Adler">darin</attacher>
            
              <data encoding="base64">SW5kZXg6IGt3cS9LV1FUZXh0Q29kZWMubW0KPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL2N2cy9yb290
L1dlYkNvcmUva3dxL0tXUVRleHRDb2RlYy5tbSx2CnJldHJpZXZpbmcgcmV2aXNpb24gMS40OQpk
aWZmIC1wIC11IC1wIC11IC1yMS40OSBrd3EvS1dRVGV4dENvZGVjLm1tCi0tLSBrd3EvS1dRVGV4
dENvZGVjLm1tCTE0IERlYyAyMDA0IDAwOjEwOjE4IC0wMDAwCTEuNDkKKysrIGt3cS9LV1FUZXh0
Q29kZWMubW0JMTMgSnVuIDIwMDUgMTU6MDA6MzIgLTAwMDAKQEAgLTI4LDYgKzI4LDcgQEAKICNp
bXBvcnQgIktXUUFzc2VydGlvbnMuaCIKICNpbXBvcnQgIktXUUNoYXJzZXRzLmgiCiAKK2NvbnN0
IFVuaUNoYXIgcmVwbGFjZW1lbnRDaGFyYWN0ZXIgPSAweEZGRkQ7CiBjb25zdCBVbmlDaGFyIEJP
TSA9IDB4RkVGRjsKIAogY2xhc3MgS1dRVGV4dERlY29kZXIgOiBwdWJsaWMgUVRleHREZWNvZGVy
IHsKQEAgLTQ4LDcgKzQ5LDcgQEAgcHJpdmF0ZToKICAgICBPU1N0YXR1cyBjcmVhdGVURUNDb252
ZXJ0ZXIoKTsKICAgICBPU1N0YXR1cyBjb252ZXJ0T25lQ2h1bmtVc2luZ1RFQyhjb25zdCB1bnNp
Z25lZCBjaGFyICppbnB1dEJ1ZmZlciwgaW50IGlucHV0QnVmZmVyTGVuZ3RoLCBpbnQgJmlucHV0
TGVuZ3RoLAogICAgICAgICB2b2lkICpvdXRwdXRCdWZmZXIsIGludCBvdXRwdXRCdWZmZXJMZW5n
dGgsIGludCAmb3V0cHV0TGVuZ3RoKTsKLSAgICBzdGF0aWMgdm9pZCBhcHBlbmRPbWl0dGluZ051
bGxzQW5kQk9NcyhRU3RyaW5nICZzLCBjb25zdCBVbmlDaGFyICpjaGFyYWN0ZXJzLCBpbnQgYnl0
ZUNvdW50KTsKKyAgICBzdGF0aWMgdm9pZCBhcHBlbmRPbWl0dGluZ1Vud2FudGVkKFFTdHJpbmcg
JnMsIGNvbnN0IFVuaUNoYXIgKmNoYXJhY3RlcnMsIGludCBieXRlQ291bnQpOwogICAgIAogICAg
IEtXUVRleHREZWNvZGVyKGNvbnN0IEtXUVRleHREZWNvZGVyICYpOwogICAgIEtXUVRleHREZWNv
ZGVyICZvcGVyYXRvcj0oY29uc3QgS1dRVGV4dERlY29kZXIgJik7CkBAIC0zNTYsMTQgKzM1Nywz
MCBAQCBPU1N0YXR1cyBLV1FUZXh0RGVjb2Rlcjo6Y3JlYXRlVEVDQ29udmVyCiAgICAgcmV0dXJu
IG5vRXJyOwogfQogCi12b2lkIEtXUVRleHREZWNvZGVyOjphcHBlbmRPbWl0dGluZ051bGxzQW5k
Qk9NcyhRU3RyaW5nICZzLCBjb25zdCBVbmlDaGFyICpjaGFyYWN0ZXJzLCBpbnQgYnl0ZUNvdW50
KQorLy8gV2Ugc3RyaXAgTlVMIGNoYXJhY3RlcnMgYmVjYXVzZSBvdGhlciBicm93c2VycyAoYXQg
bGVhc3QgV2luSUUpIGRvLgorLy8gV2Ugc3RyaXAgcmVwbGFjZW1lbnQgY2hhcmFjdGVycyBiZWNh
dXNlIHRoZSBURUMgY29udmVydGVyIGZvciBVVEYtOCBjb252ZXJ0cworLy8gaW52YWxpZCBzZXF1
ZW5jZXMgaW50byByZXBsYWNlbWVudCBjaGFyYWN0ZXJzLCBidXQgb3RoZXIgYnJvd3NlcnMgZGlz
Y2FyZCB0aGVtLgorLy8gV2Ugc3RyaXAgQk9NIGNoYXJhY3RlcnMgYmVjYXVzZSB0aGV5IGNhbiBz
aG93IHVwIGJvdGggYXQgdGhlIHN0YXJ0IG9mIGNvbnRlbnQKKy8vIGFuZCBpbnNpZGUgY29udGVu
dCwgYW5kIHdlIG5ldmVyIHdhbnQgdGhlbSB0byBlbmQgdXAgaW4gdGhlIGRlY29kZWQgdGV4dC4K
K3N0YXRpYyBpbmxpbmUgYm9vbCB1bndhbnRlZChVbmlDaGFyIGMpCit7CisgICAgc3dpdGNoIChj
KSB7CisgICAgICAgIGNhc2UgMDoKKyAgICAgICAgY2FzZSByZXBsYWNlbWVudENoYXJhY3RlcjoK
KyAgICAgICAgY2FzZSBCT006CisgICAgICAgICAgICByZXR1cm4gdHJ1ZTsKKyAgICAgICAgZGVm
YXVsdDoKKyAgICAgICAgICAgIHJldHVybiBmYWxzZTsKKyAgICB9Cit9CisKK3ZvaWQgS1dRVGV4
dERlY29kZXI6OmFwcGVuZE9taXR0aW5nVW53YW50ZWQoUVN0cmluZyAmcywgY29uc3QgVW5pQ2hh
ciAqY2hhcmFjdGVycywgaW50IGJ5dGVDb3VudCkKIHsKICAgICBBU1NFUlQoYnl0ZUNvdW50ICUg
c2l6ZW9mKFVuaUNoYXIpID09IDApOwogICAgIGludCBzdGFydCA9IDA7CiAgICAgaW50IGNoYXJh
Y3RlckNvdW50ID0gYnl0ZUNvdW50IC8gc2l6ZW9mKFVuaUNoYXIpOwogICAgIGZvciAoaW50IGkg
PSAwOyBpICE9IGNoYXJhY3RlckNvdW50OyArK2kpIHsKLSAgICAgICAgVW5pQ2hhciBjID0gY2hh
cmFjdGVyc1tpXTsKLSAgICAgICAgaWYgKGMgPT0gMCB8fCBjID09IEJPTSkgeworICAgICAgICBp
ZiAodW53YW50ZWQoY2hhcmFjdGVyc1tpXSkpIHsKICAgICAgICAgICAgIGlmIChzdGFydCAhPSBp
KSB7CiAgICAgICAgICAgICAgICAgcy5hcHBlbmQocmVpbnRlcnByZXRfY2FzdDxjb25zdCBRQ2hh
ciAqPigmY2hhcmFjdGVyc1tzdGFydF0pLCBpIC0gc3RhcnQpOwogICAgICAgICAgICAgfQpAQCAt
NDk4LDcgKzUxNSw3IEBAIFFTdHJpbmcgS1dRVGV4dERlY29kZXI6OmNvbnZlcnRVc2luZ1RFQygK
ICAgICAgICAgICAgICAgICByZXR1cm4gUVN0cmluZygpOwogICAgICAgICB9CiAKLSAgICAgICAg
YXBwZW5kT21pdHRpbmdOdWxsc0FuZEJPTXMocmVzdWx0LCBidWZmZXIsIGJ5dGVzV3JpdHRlbik7
CisgICAgICAgIGFwcGVuZE9taXR0aW5nVW53YW50ZWQocmVzdWx0LCBidWZmZXIsIGJ5dGVzV3Jp
dHRlbik7CiAKICAgICAgICAgYnVmZmVyV2FzRnVsbCA9IHN0YXR1cyA9PSBrVEVDT3V0cHV0QnVm
ZmVyRnVsbFN0YXR1czsKICAgICB9CkBAIC01MDYsNyArNTIzLDcgQEAgUVN0cmluZyBLV1FUZXh0
RGVjb2Rlcjo6Y29udmVydFVzaW5nVEVDKAogICAgIGlmIChmbHVzaCkgewogICAgICAgICB1bnNp
Z25lZCBsb25nIGJ5dGVzV3JpdHRlbiA9IDA7CiAgICAgICAgIFRFQ0ZsdXNoVGV4dChfY29udmVy
dGVyLCByZWludGVycHJldF9jYXN0PHVuc2lnbmVkIGNoYXIgKj4oYnVmZmVyKSwgc2l6ZW9mKGJ1
ZmZlciksICZieXRlc1dyaXR0ZW4pOwotICAgICAgICBhcHBlbmRPbWl0dGluZ051bGxzQW5kQk9N
cyhyZXN1bHQsIGJ1ZmZlciwgYnl0ZXNXcml0dGVuKTsKKyAgICAgICAgYXBwZW5kT21pdHRpbmdV
bndhbnRlZChyZXN1bHQsIGJ1ZmZlciwgYnl0ZXNXcml0dGVuKTsKICAgICB9CiAKICAgICAvLyBX
b3JrYXJvdW5kIGZvciBhIGJ1ZyBpbiB0aGUgVGV4dCBFbmNvZGluZyBDb252ZXJ0ZXIgKHNlZSBi
dWcgMzIyNTQ3MikuCg==
</data>
<flag name="review"
          id="22"
          type_id="1"
          status="+"
          setter="sullivan"
    />
          </attachment>
      

    </bug>

</bugzilla>