Update Rust crate regex to 1.8.0 #30
Reference in New Issue
Block a user
Delete Branch "renovate/all"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR contains the following updates:
1.7.3->1.8.0Release Notes
rust-lang/regex
v1.8.0==================
This is a sizeable release that will be soon followed by another sizeable
release. Both of them will combined close over 40 existing issues and PRs.
This first release, despite its size, essentially represent preparatory work
for the second release, which will be even bigger. Namely, this release:
aho-corasickto the recently release 1.0version.
regex-syntaxto the simultaneously released0.7version. The changes toregex-syntaxprincipally revolve around arewrite of its literal extraction code and a number of simplifications and
optimizations to its high-level intermediate representation (HIR).
The second release, which will follow ~shortly after the release above, will
contain a soup-to-nuts rewrite of every regex engine. This will be done by
bringing
regex-automataintothis repository, and then changing the
regexcrate to be nothing but an APIshim layer on top of
regex-automata's API.These tandem releases are the culmination of about 3
years of on-and-off work that began in earnest in March
2020.
Because of the scale of changes involved in these releases, I would love to
hear about your experience. Especially if you notice undocumented changes in
behavior or performance changes (positive or negative).
Most changes in the first release are listed below. For more details, please
see the commit log, which reflects a linear and decently documented history
of all changes.
New features:
Permit many more characters to be escaped, even if they have no significance.
More specifically, any ASCII character except for
[0-9A-Za-z<>]can now beescaped. Also, a new routine,
is_escapeable_character, has been added toregex-syntaxto query whether a character is escapeable or not.Add
Regex::captures_at. This filles a hole in the API, but doesn't otherwiseintroduce any new expressive power.
Capture group names are now Unicode-aware. They can now begin with either a
_or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints
can be any sequence of alpha-numeric codepoints, along with
_,.,[and]. Note that replacement syntax has not changed.Add
Match::is_emptyandMatch::lenAPIs.Add an
impl Default for RegexSet, with the default being the empty set.A new method,
Regex::static_captures_len, has been added which returns thenumber of capture groups in the pattern if and only if every possible match
always contains the same number of matching groups.
Named captures can now be written as
(?<name>re)in addition to(?P<name>re).regex-syntaxnow supports empty character classes.regex-syntaxnow has an optionalstdfeature. (This will cometo
regexin the second release.)Hirtype inregex-syntaxhas had a number of simplificationsmade to it.
regex-syntaxhas support for a newRflag for enabling CRLFmode. This will be supported in
regexproper in the second release.regex-syntaxnow has proper support for "regex that nevermatches" via
Hir::fail().hir::literalmodule ofregex-syntaxhas been completelyre-worked. It now has more documentation, examples and advice.
allow_invalid_utf8option inregex-syntaxhas been renamedto
utf8, and the meaning of the boolean has been flipped.Performance improvements:
aho-corasick 1.0may improve performance in somecases. It's difficult to characterize exactly which patterns this might impact,
but if there are a small number of longish (>= 4 bytes) prefix literals, then
it might be faster than before.
Bug fixes:
Improve
Debugimpl forMatchso that it doesn't show the entire haystack.#731:
Fix a number of issues with printing
Hirvalues as regex patterns.Add explicit example of
foo|barin the regex syntax docs.Clarify that
SetMatches::lendoes not (regretably) refer to the number ofmatches in the set.
Clarify "verbose mode" in regex syntax documentation.
#950:
Fix
CaptureLocations::getso that it never panics.Clarify documentation for
Regex::shortest_match.Fix
\p{Sc}so that it is equivalent to\p{Currency_Symbol}.Add more clarifying documentation to the
CompiledTooBigerror variant.Clarify that
regex::Regexsearches as if the haystack is a sequence ofUnicode scalar values.
Replace
__Nonexhaustivevariants with#[non_exhaustive]attribute.Optimize case folding since it can get quite slow in some pathological cases.
Reject
(?-u:\W)inregex::RegexAPIs.Add a missing
voidkeyword to indicate "no parameters" in C API.Fix
\p{Lc}so that it is equivalent to\p{Cased_Letter}.Clarify documentation for
\pXsyntax.Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.