DataStore::CAS::FS::InvalidUTF8 - Wrapper to represent non-utf8 data in a unicode context


version 0.011000


  my $j= JSON->new()->convert_blessed;
  my $x= DataStore::CAS::FS::InvalidUTF8->decode_utf8("\x{FF}");
  my $json= $j->encode($x);
  my $x2= "".$j->decode($json);
  is( $x, $x2 );
  ok( !utf8::is_utf8($x2) );


Much like using 'i' (or j) as the square root of -1, InvalidUTF8 allows a value which should have been utf-8, but isn't, to exist alongside the others.

Combining InvalidUTF8 parts to make a valid utf-8 string will automatically decode the utf-8 into the resulting unicode string.

Comparing InvalidUTF8 with a regular perl string will first convert the string to a UTF-8 representation, and then do a byte-wise comparison.

InvalidUTF8 can also safely pass through JSON, if the filter is added to the JSON decoder, and "allow_blessed" is set on the encoder.



  $string_or_ref= $class->decode_utf8( $byte_str )

If the $byte_str is valid UTF-8, this method returns the decoded perl unicode string. If not, it returns the string wrapped in an instance of InvalidUTF8.


This method returns true, and can be used in tests like

  if ($_->can('is_non_unicode')) { ... }

as a way of detecting InvalidUTF8 objects by API rather than class hierarchy.

to_string, '""' operator

Returns the original string.

str_compare, 'cmp' operator

Converts peer to utf-8 bytes, then compares the bytes.

str_concat, '.' operator

Converts the peer to utf-8 bytes, concatenates the bytes, and then re-evaluates whether the result needs to be wrapped in an instance of InvalidUTF8.


  $json_instance= $class->add_json_filter($json_instance);

Applies a filter to the JSON object so that when it encounters

  { "*InvalidUTF8*": "$string" }

it will inflate the string using the FROM_JSON method.


Called by the JSON module when convert_blessed is enabled. Returns

  { "*InvalidUTF8*" => $original_str }

which can be converted back to a InvalidUTF8 object during decode_json, if the filter is applied.


Pass this function to JSON's filter_json_single_key_object with a key of *InvalidUTF8* to restore the objects that were serialized. It takes care of calling utf8::downgrade to undo the JSON module's unicode conversion.


Michael Conrad <>


This software is copyright (c) 2013 by Michael Conrad, and IntelliTree Solutions llc.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.