module Puppet::Util::CharacterEncoding
A module to centralize heuristics/practices for managing character encoding in Puppet
Public Class Methods
Given a string, attempts to convert a copy of the string to UTF-8. Conversion uses encode - the string's internal byte representation is modifed to UTF-8.
This method is intended for situations where we generally trust that the string's bytes are a faithful representation of the current encoding associated with it, and can use it as a starting point for transcoding (conversion) to UTF-8.
@api public @param [String] string a string to transcode @return [String] copy of the original string, in UTF-8 if transcodable
# File lib/puppet/util/character_encoding.rb 17 def convert_to_utf_8(string) 18 original_encoding = string.encoding 19 string_copy = string.dup 20 begin 21 if original_encoding == Encoding::UTF_8 22 if !string_copy.valid_encoding? 23 Puppet.debug { 24 _("%{value} is already labeled as UTF-8 but this encoding is invalid. It cannot be transcoded by Puppet.") % { value: string.dump } 25 } 26 end 27 # String is already valid UTF-8 - noop 28 return string_copy 29 else 30 # If the string comes to us as BINARY encoded, we don't know what it 31 # started as. However, to encode! we need a starting place, and our 32 # best guess is whatever the system currently is (default_external). 33 # So set external_encoding to default_external before we try to 34 # transcode to UTF-8. 35 string_copy.force_encoding(Encoding.default_external) if original_encoding == Encoding::BINARY 36 return string_copy.encode(Encoding::UTF_8) 37 end 38 rescue EncodingError => detail 39 # Set the encoding on our copy back to its original if we modified it 40 string_copy.force_encoding(original_encoding) if original_encoding == Encoding::BINARY 41 42 # Catch both our own self-determined failure to transcode as well as any 43 # error on ruby's part, ie Encoding::UndefinedConversionError on a 44 # failure to encode!. 45 Puppet.debug { 46 _("%{error}: %{value} cannot be transcoded by Puppet.") % { error: detail.inspect, value: string.dump } 47 } 48 return string_copy 49 end 50 end
Given a string, tests if that string's bytes represent valid UTF-8, and if so return a copy of the string with external encoding set to UTF-8. Does not modify the byte representation of the string. If the string does not represent valid UTF-8, does not set the external encoding.
This method is intended for situations where we do not believe that the encoding associated with a string is an accurate reflection of its actual bytes, i.e., effectively when we believe Ruby is incorrect in its assertion of the encoding of the string.
@api public @param [String] string to set external encoding (re-label) to utf-8 @return [String] a copy of string with external encoding set to utf-8, or a copy of the original string if override would result in invalid encoding.
# File lib/puppet/util/character_encoding.rb 66 def override_encoding_to_utf_8(string) 67 string_copy = string.dup 68 original_encoding = string_copy.encoding 69 return string_copy if original_encoding == Encoding::UTF_8 70 if string_copy.force_encoding(Encoding::UTF_8).valid_encoding? 71 return string_copy 72 else 73 Puppet.debug { 74 _("%{value} is not valid UTF-8 and result of overriding encoding would be invalid.") % { value: string.dump } 75 } 76 # Set copy back to its original encoding before returning 77 return string_copy.force_encoding(original_encoding) 78 end 79 end