SAS® DATA Step processing in Hadoop fails if the LIBNAME HADOOP statement contains certain characters or order of options


The SAS DATA step can execute in Hadoop if the SAS® Embedded Process is installed on your Hadoop cluster and the requirements for in-database processing are met as described in the SAS® 9.4 In-Database Products: User's Guide. However, the DATA step will fail to execute in Hadoop if the LIBNAME HADOOP statement uses any of the following syntax.

The exact error can differ based on the syntax of the LIBNAME HADOOP statement and the Hadoop distribution used. Include msglevel=i in your OPTIONS statement to provide more error output in the SAS log.

Sample Code

libname myhive hadoop server="hadoop.example.com" user="testuser" password="mypass";
options dsaccel=any msglevel=i;
data myhive.cars2;
set myhive.cars1;
run;


This message error occurs when the USER value is quoted and the Hadoop distribution is MapR:


NOTE: Attempting to run DATA Step in Hadoop. ERROR: Hadoop EP Server is not installed.
ERROR: java.io.IOException: Error getting user info for current user, "testuser"
NOTE: The given DATA Step code for the data set "MYHIVE.CARS2" could not be executed in Hadoop. It will be run in the standard way.


This error occurs when the USER value is quoted and the Hadoop distribution is Cloudera or Hortonworks:

NOTE: Attempting to run DATA Step in Hadoop.
ERROR: Map/Reduce job failed. Could not run Hadoop job.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the COMPILATION phase.
ERROR: org.apache.hadoop.security.AccessControlException: Permission denied: user="testuser", access=WRITE, inode= "/user/"testuser"/.staging":hdfs:hdfs:drwxr-xr-x


This error occurs when the lowercase USER option is included in the LIBNAME statement, but the PASSWORD option is not included, or if the PASSWORD option is specified before the lowercase USER option:

NOTE: Attempting to run DATA Step in Hadoop.
INFO: Invalid Hadoop user name specified on libname statement.
NOTE: The given DATA Step code for the data set "MYHIVE.CARS2" could not be executed in Hadoop.
It will be run in the standard way.


Resolving the Problem
Ensure that the LIBNAME HADOOP statement does not contain any of the problems discussed above. Here are some examples of correct LIBNAME statements:

libname myhive hadoop server="hadoop.example.com" user=testuser password=mypass;
libname myhive hadoop server="hadoop.example.com" config="/mnt/hadoop/merged.xml" user=testuser password=mypass;
libname myhdfs hadoop server="hadoop.example.com" user=testuser password=mypass
    hdfs_metadir="/user/testuser/meta" hdfs_datadir="/user/testuser/data" hdfs_tempdir="/user/testuser/temp";


Note: Include a value for the PASSWORD option even if the Hadoop environment is not secured. The value will not be used because security is not enabled in Hadoop. However, the DATA step requires the PASSWORD value in order to execute in Hadoop. If Hadoop is secured with Kerberos authentication, the USER and PASSWORD options can be omitted from the LIBNAME statement.