# -*- mode: org -*-

* Fix UTF-8 support
  This turns out to be harder than I was thinking. The first step is to compile
  two versions of the regexp, one for matching UTF-8 and one for matching
  Latin1 (maybe on demand).

  RE2 won't accept \x{...} escapes that are greater than the current character
  set. I was hoping it would be possible to give a string containing these to
  RE2 then let RE2 realise part of it won't match (e.g. (?:foo|\x{1234}) will
  still match foo, even if the input string isn't UTF-8).

  (I'm only talking about \x{...}; this is the only case I have to
  care about, \p{...} *are* accepted by RE2 regardless. Due to Perl's
  behaviour we can't have raw UTF-8 in the string if the UTF-8 flag
  isn't on.)

  The approach for now will probably be to replace \x{nnn} in strings (where
  nnn>0xFF) with something that won't match (maybe [^\x00-\xff]), but allows
  the other branches to match.
** Think about supporting perl 5.14's unicode regexp flags
  At least at the top level, implementing within RE2 would be silly.

  RE2 doesn't have all the behaviours perl does (i.e. /a is implied
  for \d, etc.). Might just be a case of documenting what RE2 does,
  once UTF-8 is working to some extent.  An alternative could be to
  make things explicit (e.g. you need to say "no feature
  'unicode_strings'" if you happen to have enabled them to use RE2).
* Switch to dzil
* Support more options
** never_nl could be useful for cpangrep optimisations
* Support RE2::Set functionality
  i.e. a Regexp::RE2::Set class that can have RE2 regexps added into it
  then either a match method or maybe overload ~~?
* Improve tests
** See if t/re/re_tests from Perl can be used.
** Improve performance comparisons
   See maybe https://github.com/axiak/pyre2/blob/master/tests/performance.py
* Support /x (probably needs RE2 changes to do properly)
* Both Perl and RE2 store the stringification of the regexp, can we avoid this?