Teradata performance with a large number of observations is significantly slower after upgrading to SAS® 9.4M9 (TS1M9)


After upgrading to SAS 9.4M9, some queries that use large OBS= values with Teradata run much slower than in earlier SAS versions. No error messages are reported when this issue occurs, and the code completes successfully. 

For example, this code runs significantly slower in SAS 9.4M9 than it did in SAS® 9.4M7 (TS1M7) or SAS® 9.4M8 (TS1M8):

data work.mytest;
   set teralib.mytable (OBS=50000);
run;

There are two methods available to query a subset of data using the OBS=N setting:

With the release of SAS 9.4M9, the underlying methodology was changed from using SAMPLE to TOP because TOP provided the best performance results with the most typical use cases involving OBS=. It has since been confirmed that Teradata performs better using TOP with relatively small data selections and SAMPLE with OBS values exceeding 100,000.

A possible workaround for this issue is to use explicit pass-through and to specify that SAMPLE be invoked when working with a large amount of data (500,000 in the following example):

proc sql;

  connect to teradata (...specify connection options here...);

     create table work.mytest as

       select * from connection to teradata (

          select * from mytable SAMPLE 500000

       );

  disconnect from teradata;

quit;