Variables read using SET, MERGE, and UPDATE statements are automatically retained


Variables that are read by the SET, MERGE, and UPDATE statements are automatically retained between iterations of the DATA step. Variables that are created in the data set are not automatically retained and need the RETAIN statement if that behavior is desired.

The sample code provided shows the difference in retention between variables that are read from a data set and variables that are created within the DATA step. When the Program Data Vector (PDV) is created at compile time, variables that are read from a data set are marked for retention. That action holds true even if that variable is not read for a particular observation. Variables that are read from a data set are initialized to missing upon change of a BY-group. Variables that are created within the DATA step are initialized to missing at the start of each iteration of the DATA step.

For more information, please see the section titled Data Step Processing in the SAS Language Reference: Concepts documentation.


Full Code

This sample demonstrates the difference in behavior between the variable level that is read from a data set, and the variable grade that is created within the data step.

/*Create a data set with name and score for each test.*/
data scores;
  input name :$6. test score;
datalines;
Alice 1 84
Alice 2 93
Alice 3 100
Alice 4 65
;

/*Create a data set with the assigned level for name.*/
data levels;
  input name :$6. level :$10.;
datalines;
Alice new
;

/* Merge the scores and levels to create new level values based on test score.              */

/* Since the variable level is marked as being read from a data set, its value is           */
/* retained throughout the by-group.  After the third iteration of the Data step, level is  */ 
/* set to "scholar" based on the IF statement.  Since the variable level is retained, at    */
/* the start of the fourth iteration, the value is still "scholar".  Since the criteria for */ 
/* the IF statement is not met, there is no change made to the variable and it remains      */
/* "scholar".                                                                               */

/* Contrast this with the variable grade which is created in the Data step  It is           */
/* initialized to missing at the beginning of each iteration of the Data step. Thus, at the */
/* start of the fourth iteration, grade has been set to missing and since the criteria for  */
/* the IF statement is not met, there is no new assignment made to the variable grade and   */
/* it remains missing.                                                                      */

data results;
  merge scores levels;
  by name;
  if score > 92 then do;
   level = 'scholar';
   grade = 'A';
  end;
run;

proc print; 
run;


Output

Obs    name     test    score    level      grade

1     Alice      1       84     new
2     Alice      2       93     scholar      A
3     Alice      3      100     scholar      A
4     Alice      4       65     scholar