Use negative lookbehind when tokenizing string literals

This can double the speed of tokenizing large RTF files that use \'hh escape sequences.
2026-02-15 12:49:30 +00:00 · 2015-11-05 10:18:44 -05:00
parent 362d300cb0
commit fea8bb21a0
1 changed files with 2 additions and 2 deletions
--- a/lib/linguist/tokenizer.rb
+++ b/lib/linguist/tokenizer.rb
@@ -86,13 +86,13 @@ module Linguist
          if s.peek(1) == "\""
            s.getch
          else
-            s.skip_until(/[^\\]"/)
+            s.skip_until(/(?<!\\)"/)
          end
        elsif s.scan(/'/)
          if s.peek(1) == "'"
            s.getch
          else
-            s.skip_until(/[^\\]'/)
+            s.skip_until(/(?<!\\)'/)
          end
        # Skip number literals