How the byte-order mark (BOM) affects the format/informat of SAS®


The BOM is a Unicode character that signals the byte order of a text file. If you attempt to read an external file that contains a BOM, an error simliar to the following appears in the SAS log file:

NOTE: No encoding was specified for the fileref xxxxxxx.  A byte order mark in the file indicates that the   
      data is encoded in "utf-16le".  This encoding will be used to process the file.
NOTE: The infile 'xxx.txt' is:
      Filename=xxx.txt,
      RECFM=V,LRECL=512,File Size (bytes)=16457934,
      Last Modified=10Oct2007:13:43:38,
      Create Time=05Mar2009:01:23:09

WARNING: A character that could not be transcoded was encountered.
NOTE: Invalid data for var1 in line 1 1-10.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

To circumvent this problem, specify the SAS system option ENCODING=UTF8.

See the ENCODING= Option section of the SAS® 9.3 National Language Support (NLS): Reference Guide.