Home > Perl > Perl – Writing to file outputting in UTF-16 instead of UTF-8

Perl – Writing to file outputting in UTF-16 instead of UTF-8

Recently I needed to write a perl script that ran on Cygwin. My default setting means that any files written by perl were being written in UTF-16. This led to what appeared to be a lot of Japanese writing, making it completely illegible and unusable.

After a lot of digging around on the internet, I managed to hack together the following code:

open my $SH, ">>:raw:encoding(UTF16-LE):crlf:utf8", "test1.txt";;
print $SH "\x{FEFF}";
print "Some test writing \n";
close ($SH);

This code tells perl we’re going to pass “characters” to this file handle instead of bytes. Next transform \n into \r\n to give DOS line endings. Next apply the UTF16-LE, so that 0x0A becomes 0x0A 0x00. This stops perl writing a byte order mark (BOM) at the beginning of the file. Finally, the raw:encoding removes the default ctrf so that it is not in the wrong place.

Now the file is being opened with the correct coding, we need to write the BOM to the beginning of the file to tell readers of this file what endianness it is. We do this by printing \x{FFF} to the file.

Advertisements
  1. PerlOracleInterested
    March 8, 2014 at 3:37 PM

    *Perl 5.16* on Windows plattform.

    To write the BOM I use
    {quote}use File::BOM;{quote}
    and add
    {quote}’:via(File::BOM)'{quote}
    to the 1st open argument. Tested with UTF-8, UTF-16LE, UTF-16BE.

    Using just ‘UTF-16’ the BOM is created automaically for BE.

    The stuff seems to be only handled in the way you do it: Using ‘:raw’ and ‘:crlf’.

    Try
    {quote}use utf8;{quote}
    for the whole Perl script. I’m shure you don’t need the ‘:utf8’ explicitly for every (output-)file.

  2. Anonymous
    June 16, 2014 at 1:40 PM

    excellent post, thanks a lot !

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: