WebKit Bugzilla
Attachment 362497 Details for
Bug 180526
: Update MIME type parser
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
Patch
bug-180526-20190220164147.patch (text/plain), 20.37 KB, created by
Rob Buis
on 2019-02-20 07:41:47 PST
(
hide
)
Description:
Patch
Filename:
MIME Type:
Creator:
Rob Buis
Created:
2019-02-20 07:41:47 PST
Size:
20.37 KB
patch
obsolete
>Subversion Revision: 241761 >diff --git a/Source/WebCore/ChangeLog b/Source/WebCore/ChangeLog >index a6da53fe8e04ad79dd06c117c410a040b2b97e76..5e106a8474dfc0b90c78e63e7475a09e5da9ad37 100644 >--- a/Source/WebCore/ChangeLog >+++ b/Source/WebCore/ChangeLog >@@ -1,3 +1,44 @@ >+2019-02-19 Rob Buis <rbuis@igalia.com> >+ >+ Update MIME type parser >+ https://bugs.webkit.org/show_bug.cgi?id=180526 >+ >+ Reviewed by Darin Adler. >+ >+ Further testing showed the MIME parser needs these fixes: >+ - stripWhitespace is wrong for removing HTTP whitespace, use >+ stripLeadingAndTrailingHTTPSpaces instead. >+ - HTTP Token code points checking for Rfc2045 and Mimesniff were >+ mixed up, use the dedicated isValidHTTPToken for Mimesniff mode. >+ - Quoted Strings were not unescaped/escaped, this seems ok for >+ serializing but is wrong when gettings individual parameter values. >+ Implement [1] and [2] Step 2.4 to properly unescape and escape. >+ >+ This change also tries to avoid hard to read uses of find. >+ >+ Test: ParsedContentType.Serialize >+ >+ [1] https://fetch.spec.whatwg.org/#collect-an-http-quoted-string >+ [2] https://mimesniff.spec.whatwg.org/#serializing-a-mime-type >+ >+ * platform/network/ParsedContentType.cpp: >+ (WebCore::skipSpaces): >+ (WebCore::parseToken): >+ (WebCore::isNotQuoteOrBackslash): >+ (WebCore::collectHTTPQuotedString): >+ (WebCore::containsNonTokenCharacters): >+ (WebCore::parseQuotedString): >+ (WebCore::ParsedContentType::parseContentType): >+ (WebCore::ParsedContentType::create): >+ (WebCore::ParsedContentType::setContentType): >+ (WebCore::containsNonQuoteStringTokenCharacters): >+ (WebCore::ParsedContentType::setContentTypeParameter): >+ (WebCore::ParsedContentType::serialize const): >+ (WebCore::substringForRange): Deleted. >+ (WebCore::isNonTokenCharacter): Deleted. >+ (WebCore::isNonQuotedStringTokenCharacter): Deleted. >+ * platform/network/ParsedContentType.h: >+ > 2019-02-19 Commit Queue <commit-queue@webkit.org> > > Unreviewed, rolling out r241722. >diff --git a/Source/WebCore/platform/network/ParsedContentType.cpp b/Source/WebCore/platform/network/ParsedContentType.cpp >index ef1c5cd0e78e83eb46750fff044f72760bae1ea2..b963ca143198e79ed831117eba1e31f82a58637e 100644 >--- a/Source/WebCore/platform/network/ParsedContentType.cpp >+++ b/Source/WebCore/platform/network/ParsedContentType.cpp >@@ -38,7 +38,7 @@ > > namespace WebCore { > >-static void skipSpaces(const String& input, unsigned& startIndex) >+static void skipSpaces(StringView input, unsigned& startIndex) > { > while (startIndex < input.length() && isHTTPSpace(input[startIndex])) > ++startIndex; >@@ -56,7 +56,7 @@ static bool isTokenCharacter(UChar c) > > using CharacterMeetsCondition = bool (*)(UChar); > >-static Optional<SubstringRange> parseToken(const String& input, unsigned& startIndex, CharacterMeetsCondition characterMeetsCondition, Mode mode, bool skipTrailingWhitespace = false) >+static Optional<StringView> parseToken(StringView input, unsigned& startIndex, CharacterMeetsCondition characterMeetsCondition, Mode mode, bool skipTrailingWhitespace = false) > { > unsigned inputLength = input.length(); > unsigned tokenStart = startIndex; >@@ -77,19 +77,55 @@ static Optional<SubstringRange> parseToken(const String& input, unsigned& startI > while (input[tokenEnd - 1] == ' ') > --tokenEnd; > } >- return SubstringRange(tokenStart, tokenEnd - tokenStart); >+ return input.substring(tokenStart, tokenEnd - tokenStart); > } > >-static bool containsNonTokenCharacters(const String& input, SubstringRange range) >+static bool isNotQuoteOrBackslash(UChar ch) > { >- for (unsigned index = 0; index < range.second; ++index) { >- if (!isTokenCharacter(input[range.first + index])) >+ return ch != '"' && ch != '\\'; >+} >+ >+static String collectHTTPQuotedString(StringView input, unsigned& startIndex) >+{ >+ ASSERT(input[startIndex] == '"'); >+ unsigned inputLength = input.length(); >+ unsigned& position = startIndex; >+ position++; >+ StringBuilder builder; >+ while (true) { >+ unsigned positionStart = position; >+ parseToken(input, position, isNotQuoteOrBackslash, Mode::MimeSniff); >+ builder.append(input.substring(positionStart, position - positionStart)); >+ if (position >= inputLength) >+ break; >+ UChar quoteOrBackslash = input[position++]; >+ if (quoteOrBackslash == '\\') { >+ if (position >= inputLength) { >+ builder.append(quoteOrBackslash); >+ break; >+ } >+ builder.append(input[position++]); >+ } else { >+ ASSERT(quoteOrBackslash == '"'); >+ break; >+ } >+ >+ } >+ return builder.toString(); >+} >+ >+static bool containsNonTokenCharacters(StringView input, Mode mode) >+{ >+ if (mode == Mode::MimeSniff) >+ return !isValidHTTPToken(input.toStringWithoutCopying()); >+ for (unsigned index = 0; index < input.length(); ++index) { >+ if (!isTokenCharacter(input[index])) > return true; > } > return false; > } > >-static Optional<SubstringRange> parseQuotedString(const String& input, unsigned& startIndex, Mode mode) >+static Optional<StringView> parseQuotedString(StringView input, unsigned& startIndex) > { > unsigned inputLength = input.length(); > unsigned quotedStringStart = startIndex + 1; >@@ -98,20 +134,14 @@ static Optional<SubstringRange> parseQuotedString(const String& input, unsigned& > if (quotedStringEnd >= inputLength) > return WTF::nullopt; > >- if (input[quotedStringEnd++] != '"' || quotedStringEnd >= inputLength) { >- if (mode == Mode::Rfc2045) >- return WTF::nullopt; >- return SubstringRange(quotedStringStart, quotedStringEnd - quotedStringStart); >- } >+ if (input[quotedStringEnd++] != '"' || quotedStringEnd >= inputLength) >+ return WTF::nullopt; > > bool lastCharacterWasBackslash = false; > char currentCharacter; > while ((currentCharacter = input[quotedStringEnd++]) != '"' || lastCharacterWasBackslash) { >- if (quotedStringEnd >= inputLength) { >- if (mode == Mode::Rfc2045) >- return WTF::nullopt; >- break; >- } >+ if (quotedStringEnd >= inputLength) >+ return WTF::nullopt; > if (currentCharacter == '\\' && !lastCharacterWasBackslash) { > lastCharacterWasBackslash = true; > continue; >@@ -120,13 +150,8 @@ static Optional<SubstringRange> parseQuotedString(const String& input, unsigned& > lastCharacterWasBackslash = false; > } > if (input[quotedStringEnd - 1] == '"') >- return SubstringRange(quotedStringStart, quotedStringEnd - quotedStringStart - 1); >- return SubstringRange(quotedStringStart, quotedStringEnd - quotedStringStart); >-} >- >-static String substringForRange(const String& string, const SubstringRange& range) >-{ >- return string.substring(range.first, range.second); >+ quotedStringEnd++; >+ return input.substring(quotedStringStart, quotedStringEnd - quotedStringStart); > } > > // From http://tools.ietf.org/html/rfc2045#section-5.1: >@@ -209,18 +234,18 @@ bool ParsedContentType::parseContentType(Mode mode) > > unsigned contentTypeStart = index; > auto typeRange = parseToken(m_contentType, index, isNotForwardSlash, mode); >- if (!typeRange || containsNonTokenCharacters(m_contentType, *typeRange)) { >+ if (!typeRange || containsNonTokenCharacters(*typeRange, mode)) { > LOG_ERROR("Invalid Content-Type, invalid type value."); > return false; > } > >- if (m_contentType[index++] != '/') { >+ if (index >= contentTypeLength || m_contentType[index++] != '/') { > LOG_ERROR("Invalid Content-Type, missing '/'."); > return false; > } > > auto subTypeRange = parseToken(m_contentType, index, isNotSemicolon, mode, mode == Mode::MimeSniff); >- if (!subTypeRange || containsNonTokenCharacters(m_contentType, *subTypeRange)) { >+ if (!subTypeRange || containsNonTokenCharacters(*subTypeRange, mode)) { > LOG_ERROR("Invalid Content-Type, invalid subtype value."); > return false; > } >@@ -228,11 +253,11 @@ bool ParsedContentType::parseContentType(Mode mode) > // There should not be any quoted strings until we reach the parameters. > size_t semiColonIndex = m_contentType.find(';', contentTypeStart); > if (semiColonIndex == notFound) { >- setContentType(SubstringRange(contentTypeStart, contentTypeLength - contentTypeStart), mode); >+ setContentType(m_contentType.substring(contentTypeStart, contentTypeLength - contentTypeStart), mode); > return true; > } > >- setContentType(SubstringRange(contentTypeStart, semiColonIndex - contentTypeStart), mode); >+ setContentType(m_contentType.substring(contentTypeStart, semiColonIndex - contentTypeStart), mode); > index = semiColonIndex + 1; > while (true) { > skipSpaces(m_contentType, index); >@@ -244,7 +269,7 @@ bool ParsedContentType::parseContentType(Mode mode) > > // Should we tolerate spaces here? > if (mode == Mode::Rfc2045) { >- if (m_contentType[index++] != '=' || index >= contentTypeLength) { >+ if (index >= contentTypeLength || m_contentType[index++] != '=') { > LOG_ERROR("Invalid Content-Type malformed parameter."); > return false; > } >@@ -258,26 +283,31 @@ bool ParsedContentType::parseContentType(Mode mode) > if (m_contentType[index++] == ';') > continue; > } >- >- String parameterName = substringForRange(m_contentType, *keyRange); >+ String parameterName = keyRange->toStringWithoutCopying(); > > // Should we tolerate spaces here? >- Optional<SubstringRange> valueRange; >- if (m_contentType[index] == '"') { >- valueRange = parseQuotedString(m_contentType, index, mode); >- if (mode == Mode::MimeSniff) >+ String parameterValue; >+ Optional<StringView> valueRange; >+ if (index < contentTypeLength && m_contentType[index] == '"') { >+ if (mode == Mode::MimeSniff) { >+ parameterValue = collectHTTPQuotedString(m_contentType, index); > parseToken(m_contentType, index, isNotSemicolon, mode); >+ } else >+ valueRange = parseQuotedString(m_contentType, index); > } else > valueRange = parseToken(m_contentType, index, isNotSemicolon, mode, mode == Mode::MimeSniff); > >- if (!valueRange) { >- if (mode == Mode::MimeSniff) >- continue; >- LOG_ERROR("Invalid Content-Type, invalid parameter value."); >- return false; >+ >+ if (parameterValue.isNull()) { >+ if (!valueRange) { >+ if (mode == Mode::MimeSniff) >+ continue; >+ LOG_ERROR("Invalid Content-Type, invalid parameter value."); >+ return false; >+ } >+ parameterValue = valueRange->toStringWithoutCopying(); > } > >- String parameterValue = substringForRange(m_contentType, *valueRange); > // Should we tolerate spaces here? > if (mode == Mode::Rfc2045 && index < contentTypeLength && m_contentType[index++] != ';') { > LOG_ERROR("Invalid Content-Type, invalid character at the end of key/value parameter."); >@@ -295,7 +325,7 @@ bool ParsedContentType::parseContentType(Mode mode) > > Optional<ParsedContentType> ParsedContentType::create(const String& contentType, Mode mode) > { >- ParsedContentType parsedContentType(mode == Mode::Rfc2045 ? contentType : contentType.stripWhiteSpace()); >+ ParsedContentType parsedContentType(mode == Mode::Rfc2045 ? contentType : stripLeadingAndTrailingHTTPSpaces(contentType)); > if (!parsedContentType.parseContentType(mode)) > return WTF::nullopt; > return { WTFMove(parsedContentType) }; >@@ -326,28 +356,29 @@ size_t ParsedContentType::parameterCount() const > return m_parameterValues.size(); > } > >-void ParsedContentType::setContentType(const SubstringRange& contentRange, Mode mode) >+void ParsedContentType::setContentType(StringView contentRange, Mode mode) > { >- m_mimeType = substringForRange(m_contentType, contentRange).stripWhiteSpace(); >+ m_mimeType = contentRange.toString(); > if (mode == Mode::MimeSniff) >- m_mimeType = m_mimeType.convertToASCIILowercase(); >-} >- >-static bool isNonTokenCharacter(UChar ch) >-{ >- return !isTokenCharacter(ch); >+ m_mimeType = stripLeadingAndTrailingHTTPSpaces(m_mimeType).convertToASCIILowercase(); >+ else >+ m_mimeType = m_mimeType.stripWhiteSpace(); > } > >-static bool isNonQuotedStringTokenCharacter(UChar ch) >+static bool containsNonQuoteStringTokenCharacters(const String& input) > { >- return !isQuotedStringTokenCharacter(ch); >+ for (unsigned index = 0; index < input.length(); ++index) { >+ if (!isQuotedStringTokenCharacter(input[index])) >+ return true; >+ } >+ return false; > } > > void ParsedContentType::setContentTypeParameter(const String& keyName, const String& keyValue, Mode mode) > { > String name = keyName; > if (mode == Mode::MimeSniff) { >- if (m_parameterValues.contains(keyName) || keyName.find(isNonTokenCharacter) != notFound || keyValue.find(isNonQuotedStringTokenCharacter) != notFound) >+ if (m_parameterValues.contains(name) || !isValidHTTPToken(name) || containsNonQuoteStringTokenCharacters(keyValue)) > return; > name = name.convertToASCIILowercase(); > } >@@ -364,9 +395,14 @@ String ParsedContentType::serialize() const > builder.append(name); > builder.append('='); > String value = m_parameterValues.get(name); >- if (value.isEmpty() || value.find(isNonTokenCharacter) != notFound) { >+ if (value.isEmpty() || !isValidHTTPToken(value)) { > builder.append('"'); >- builder.append(value); >+ for (unsigned index = 0; index < value.length(); ++index) { >+ auto ch = value[index]; >+ if (ch == '\\' || ch =='"') >+ builder.append('\\'); >+ builder.append(ch); >+ } > builder.append('"'); > } else > builder.append(value); >diff --git a/Source/WebCore/platform/network/ParsedContentType.h b/Source/WebCore/platform/network/ParsedContentType.h >index 49f2e223cd836e558a421ed0f0452b358c2ddc49..1ca2c1ce527472b0d5be1a2cf3fad89f21e6aef3 100644 >--- a/Source/WebCore/platform/network/ParsedContentType.h >+++ b/Source/WebCore/platform/network/ParsedContentType.h >@@ -40,8 +40,6 @@ enum class Mode { > Rfc2045, > MimeSniff > }; >-// <index, length> >-typedef std::pair<unsigned, unsigned> SubstringRange; > WEBCORE_EXPORT bool isValidContentType(const String&, Mode = Mode::MimeSniff); > > // FIXME: add support for comments. >@@ -64,7 +62,7 @@ private: > ParsedContentType(const ParsedContentType&) = delete; > ParsedContentType& operator=(ParsedContentType const&) = delete; > bool parseContentType(Mode); >- void setContentType(const SubstringRange&, Mode); >+ void setContentType(StringView, Mode); > void setContentTypeParameter(const String&, const String&, Mode); > > typedef HashMap<String, String> KeyValuePairs; >diff --git a/Tools/ChangeLog b/Tools/ChangeLog >index 84fe3d6f546a00479cb1b4907ca1cf3d7c016267..df8fa933b2ecef20dd5cfff8d9779e6fa733bcaa 100644 >--- a/Tools/ChangeLog >+++ b/Tools/ChangeLog >@@ -1,3 +1,16 @@ >+2019-02-19 Rob Buis <rbuis@igalia.com> >+ >+ Update MIME type parser >+ https://bugs.webkit.org/show_bug.cgi?id=180526 >+ >+ Reviewed by Darin Adler. >+ >+ Add tests involving leading and trailing whitespace, non-token >+ characters and quoted strings. >+ >+ * TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp: >+ (TestWebKitAPI::TEST): >+ > 2019-02-19 Pablo Saavedra <psaavedra@igalia.com> > > pytest is not correctly auto-installed >diff --git a/Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp b/Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp >index e084462e35d5164082905cc73ffefdabe978720c..8457f7ceb9b20dfb45fdf3138ca45e0c736b7e98 100644 >--- a/Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp >+++ b/Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp >@@ -149,6 +149,8 @@ TEST(ParsedContentType, Serialize) > EXPECT_STREQ(serializeIfValid("text/"), "NOTVALID"); > EXPECT_STREQ(serializeIfValid("text/\0"), "NOTVALID"); > EXPECT_STREQ(serializeIfValid("text/ "), "NOTVALID"); >+ EXPECT_STREQ(serializeIfValid("{/}"), "NOTVALID"); >+ EXPECT_STREQ(serializeIfValid("x/x ;x=x"), "x/x;x=x"); > EXPECT_STREQ(serializeIfValid("text/plain"), "text/plain"); > EXPECT_STREQ(serializeIfValid("text/plain\0"), "text/plain"); > EXPECT_STREQ(serializeIfValid(" text/plain"), "text/plain"); >@@ -156,11 +158,15 @@ TEST(ParsedContentType, Serialize) > EXPECT_STREQ(serializeIfValid("\ntext/plain"), "text/plain"); > EXPECT_STREQ(serializeIfValid("\rtext/plain"), "text/plain"); > EXPECT_STREQ(serializeIfValid("\btext/plain"), "NOTVALID"); >+ EXPECT_STREQ(serializeIfValid("\x0Btext/plain"), "NOTVALID"); >+ EXPECT_STREQ(serializeIfValid("\u000Btext/plain"), "NOTVALID"); > EXPECT_STREQ(serializeIfValid("text/plain "), "text/plain"); > EXPECT_STREQ(serializeIfValid("text/plain\t"), "text/plain"); > EXPECT_STREQ(serializeIfValid("text/plain\n"), "text/plain"); > EXPECT_STREQ(serializeIfValid("text/plain\r"), "text/plain"); > EXPECT_STREQ(serializeIfValid("text/plain\b"), "NOTVALID"); >+ EXPECT_STREQ(serializeIfValid("text/plain\x0B"), "NOTVALID"); >+ EXPECT_STREQ(serializeIfValid("text/plain\u000B"), "NOTVALID"); > EXPECT_STREQ(serializeIfValid(" text/plain "), "text/plain"); > EXPECT_STREQ(serializeIfValid("\ttext/plain\t"), "text/plain"); > EXPECT_STREQ(serializeIfValid("\ntext/plain\n"), "text/plain"); >@@ -236,11 +242,12 @@ TEST(ParsedContentType, Serialize) > EXPECT_STREQ(serializeIfValid("text/plain;test=\""), "text/plain;test=\"\""); > EXPECT_STREQ(serializeIfValid("text/plain;test=\"value\"foo"), "text/plain;test=value"); > EXPECT_STREQ(serializeIfValid("text/plain;test=\"value\"foo;foo=bar"), "text/plain;test=value;foo=bar"); >- EXPECT_STREQ(serializeIfValid("text/plain;test=\"val\\ue\""), "text/plain;test=\"val\\ue\""); >+ EXPECT_STREQ(serializeIfValid("text/plain;test=\"val\\ue\""), "text/plain;test=value"); > EXPECT_STREQ(serializeIfValid("text/plain;test=\"val\\\"ue\""), "text/plain;test=\"val\\\"ue\""); > EXPECT_STREQ(serializeIfValid("text/plain;test='value'"), "text/plain;test='value'"); > EXPECT_STREQ(serializeIfValid("text/plain;test='value"), "text/plain;test='value"); > EXPECT_STREQ(serializeIfValid("text/plain;test=value'"), "text/plain;test=value'"); >+ EXPECT_STREQ(serializeIfValid("text/plain;test={value}"), "text/plain;test=\"{value}\""); > } > > } // namespace TestWebKitAPI >diff --git a/LayoutTests/imported/w3c/ChangeLog b/LayoutTests/imported/w3c/ChangeLog >index 7e5e76f50c5317bd32055330cd2f6c416cc67bd2..ef8db254004286cfee51a49bb62332ea2f32a287 100644 >--- a/LayoutTests/imported/w3c/ChangeLog >+++ b/LayoutTests/imported/w3c/ChangeLog >@@ -1,3 +1,14 @@ >+2019-02-19 Rob Buis <rbuis@igalia.com> >+ >+ Update MIME type parser >+ https://bugs.webkit.org/show_bug.cgi?id=180526 >+ >+ Reviewed by Darin Adler. >+ >+ Update improved test expectations. >+ >+ * web-platform-tests/xhr/overridemimetype-blob-expected.txt: >+ > 2019-02-18 Oriol Brufau <obrufau@igalia.com> > > [css-grid] Handle indefinite percentages in fit-content() >diff --git a/LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-blob-expected.txt b/LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-blob-expected.txt >index a7543a35d21441f5473617e2dfff82b4b59edc6e..3b5f67d9cca4bb03b360a98f367e5d5a483e0fca 100644 >--- a/LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-blob-expected.txt >+++ b/LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-blob-expected.txt >@@ -57,6 +57,6 @@ PASS 52) MIME types need to be parsed and serialized: </> > PASS 53) MIME types need to be parsed and serialized: (/) > PASS 54) MIME types need to be parsed and serialized: ÿ/ÿ > PASS 55) MIME types need to be parsed and serialized: text/html(;doesnot=matter >-FAIL 56) MIME types need to be parsed and serialized: {/} assert_equals: expected "application/octet-stream" but got "{/}" >+PASS 56) MIME types need to be parsed and serialized: {/} > PASS 57) MIME types need to be parsed and serialized: Ä/Ä >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Formatted Diff
|
Diff
Attachments on
bug 180526
:
348014
|
356923
|
356928
|
357400
|
357401
|
357403
|
357443
|
357469
|
357682
|
357684
|
357687
|
357690
|
357691
|
357702
|
358515
|
358898
|
358906
|
358907
|
358917
|
359860
|
360118
|
360123
|
360129
|
360134
|
360136
|
360144
|
360303
|
360343
|
360350
|
360355
|
362245
|
362316
|
362323
|
362342
|
362348
|
362360
|
362378
|
362384
|
362387
|
362390
|
362397
|
362416
|
362418
|
362430
|
362437
|
362483
|
362486
|
362497
|
362506
|
362526
|
390349