Tips for addressing unresponsive SAS® 9.1.3 Stored Process Servers, Part 1


Part 1: How to restore unresponsive SAS Stored Process Servers with SAS® 9.1.3
SAS Technical Support has received reports of previously working SAS Stored Process Servers becoming unresponsive over time for unknown reasons. By unresponsive, we mean that the SAS Stored Process Servers are up and running, but no requests from client applications are getting through to the server. These servers might also be referred to as "hung" or "orphaned" SAS processes.

You might have encountered this problem in one or more ways. You might be the end user working with one of the BI client applications, such as SAS® Enterprise Guide®, SAS® Web Report Studio, the SAS® Add-in for Microsoft Office, the SAS® Information Delivery Portal, SAS® Stored Processes web application, or another web-based application. You click on a button expecting a report to be returned, but instead you receive a generic error or Java dump. You might be the systems administrator, who gets a call from the end user and determines that there is no stored process server responding, or has at least narrowed the problem down to involve a SAS server rather than a client.

What to do?

  1. Initially with all customers this starts as a "Put out the Fire" situation where Technical Support offers suggestions to confirm that the servers are down or unresponsive and clean up and recover from the problem immediately. This document provides tips to evaluate and restore your SAS Stored Process Servers.
  2. Then, a long-term strategy is needed to gather information and determine why the problem occurs. See SAS Note 30716, "Tips for addressing unresponsive SAS® 9.1.3 Stored Process Servers, Part 2" for a suggested approach.


Conducting Short-Term Troubleshooting to "put out the fire"


Use the following five-step approach to evaluate the status of and restore your SAS Stored Process Servers:

  1. Test the connection to the stored process server from the SAS® Management Console.
  2. Check the status of the stored process server ports by running the NETSTAT system command.
  3. Stop the object spawner.
  4. Check applicable server log files.
  5. Stop existing stored process servers and restart the object spawner.

 

Step 1: Test the Connection to the SAS Stored Process Server from the SAS Management Console

A basic test of stored process server functionality is available in the SAS Management Console. To conduct this test, follow the steps below:

  1. Open SAS Management Console. The hierarchy appears in the left pane.
  2. Click + to expand the Server Manager hierarchy.
  3. Click + to expand the SASMain server group hierarchy.
  4. Click + to expand the SASMain – Logical Stored Process Server object.
  5. Select the lowest level SASMain – Stored Process Server.
  6. In the right-hand window, select the connection: SASMain Stored Process Server Bridge.
  7. On the SAS Management Console menu, click Actions.
  8. Click Test Connection.


A box with the message Test Connection Successful! appears if the test was successful.

Customers usually receive the following error in SAS Management Console when their stored process servers are unresponsive:

sam.S251.ex.msg: A problem occurred while connecting to a load balancing spawner. Check the spawner log for more details.


Step 2: Check the Status of the Stored Process Server Ports


The command line tool NETSTAT (network statistics) displays incoming and outgoing network connections, routing tables, and various network interface statistics. This tool is available on UNIX and Windows operating systems. Use this tool to provide information about the status of the ports on which a particular stored process server runs, which by default are ports 8611, 8621, and 8631.

Follow these steps:

  1. Execute the following command in a Command prompt window:
    • Windows

prompt> netstat –ano | find "8611"

Note: substitute "8621" or "8631" in order to check the other ports.

prompt> netstat –an | grep "8611"

Note: substitute "8621" or "8631" in order to check the other ports.

2. Evaluate the port status.

Note: A state of "CLOSE_WAIT" that persists for longer than 2-5 minutes might indicate that the server is hung.

See RFC793 (pages 20 and 21) for details about the progression of states for a TCP/IP connection.

The appearance of NETSTAT output when port 8611 is unresponsive is shown in the example below.

tcp4    1312      0  myserver.na.sas.8611  otherserver.na.sas.56480 CLOSE_WAIT
tcp4    1254      0  myserver.na.sas.8611  otherserver.na.sas.56487 CLOSE_WAIT
tcp4    1300      0  myserver.na.sas.8611  otherserver.na.sas.56498 CLOSE_WAIT
tcp4       0         0  *.8611                 *.*                    LISTEN
tcp4    1280      0  myserver.na.sas.8611  otherserver.na.sas.56805 CLOSE_WAIT
tcp4    1011      0  myserver.na.sas.8611  otherserver.na.sas.56816 CLOSE_WAIT
tcp4    1267      0  myserver.na.sas.8611  otherserver.na.sas.56822 CLOSE_WAIT
tcp4    1234      0  myserver.na.sas.8611  otherserver.na.sas.56825 CLOSE_WAIT
tcp4    1260      0  myserver.na.sas.8611  otherserver.na.sas.56828 CLOSE_WAIT
tcp4       0         0  *.8621                 *.*                    LISTEN
tcp4       0         0  *.8631                 *.*                    LISTEN
tcp4    1299      0  myserver.na.sas.8611  otherserver.na.sas.56850 CLOSE_WAIT
tcp4    1280      0  myserver.na.sas.8611  otherserver.na.sas.56854 CLOSE_WAIT
tcp4    1311      0  myserver.na.sas.8611  otherserver.na.sas.56865 CLOSE_WAIT
tcp4    1269      0  myserver.na.sas.8611  otherserver.na.sas.32775 ESTABLISHED
tcp4    1273      0  myserver.na.sas.8611  otherserver.na.sas.32781 ESTABLISHED
tcp4    1281      0  myserver.na.sas.8611  otherserver.na.sas.57282 CLOSE_WAIT
tcp4    1272      0  myserver.na.sas.8611  otherserver.na.sas.32786 ESTABLISHED
tcp4    1322      0  myserver.na.sas.8611  otherserver.na.sas.57288 CLOSE_WAIT
tcp4    1284      0  myserver.na.sas.8611  otherserver.na.sas.32795 ESTABLISHED
tcp4    1302      0  myserver.na.sas.8611  otherserver.na.sas.57298 CLOSE_WAIT
tcp4    1296      0  myserver.na.sas.8611  otherserver.na.sas.32804 ESTABLISHED               

Step 3: Evaluate SAS Processes After Terminating the Object Spawner


By default, the SAS Stored Process Server is configured to execute using the sassrv user account.

If execution of the NETSTAT command reveals unresponsive servers, you should terminate the object spawner and search for remaining SAS processes that are owned by the sassrv user account. Any SAS processes owned by the sassrv account that persist after the Object Spawner shuts down are likely to be hung SAS Stored Process Servers (although other possible explanations exist).

Follow these steps:

  1. Stop the object spawner.
    • Windows

Stop the object spawner service through the Windows Services Manager.

SASROOT/BIArch/Lev1/SASMain/ObjectSpawner/ObjectSpawner.sh


From a system prompt, submit the following:

prompt> ObjectSpawner.sh stop


2. Search for remaining SAS processes that still persist after you terminate the object spawner and are owned by the sassrv user account (or equivalent at your site):

From the Processes tab, sort the process list by the User Name column and look for processes with an associated user name of sassrv.

prompt> ps-ef | grep "sassrv"

prompt>  ps –ef | grep "sassrv" | grep "8611"
prompt>  ps –ef | grep "sassrv" | grep "8621"
prompt>  ps –ef | grep "sassrv" | grep "8631"


prompt> ps –ef | grep "sassrv" | grep "sasexe/sas"
prompt> ps –ef | grep "sassrv"


Step 4: Check Applicable Server Log Files

When stored process servers become unresponsive, error and warning messages might appear in log files for the object spawner, stored process servers, metadata servers, and (for Windows systems only) the Windows Event Viewer.

You should review all the log file types noted below.

Caution: In order to preserve these log files and document specific incidents, it is critical that you copy the Object Spawner logs to backup files before restarting the Object Spawner. The existing log files will be overwritten when Object Spawner starts.

Note: Use the SAS BI Color Coding and Reporting Tools to identify errors or warnings in the object spawner, stored process server, or metadata server log files. More information about these tools is available in SAS Note 19889, "SAS® Business Intelligence Color Coding and Reporting Tools are available on the Download site."


Checking Object Spawner Logs

Go to the appropriate folder and view the object spawner log file.

Windows path: C:\SAS\<projectdir>\Lev1\SASMain\ObjectSpawner\logs

Windows file name: objspawn.log

UNIX path: <projectdir>/Lev1/SASMain/ObjectSpawner/logs

UNIX file name: objspawn_console.log

UNIX file name: objspawn.log

Note: After a stored process server becomes unresponsive, you will find one or more of the following messages written to the object spawner log file named objspawn.log:

ERROR: The tcpSockWriteVector call failed. The system error is 'Broken pipe'.
ERROR: Bridge protocol engine socket access method failed to send vector to socket, error 32 (Broken pipe).
ERROR: The Balance algorithm timed out before a server could be found.
WARNING: The load balancing instance ldblCompRefDirectConnection call failed.
WARNING: Unable to redirect the client request.

Checking Stored Process Server Logs

Go to the appropriate folder and view the stored process server log file.

Windows path: C:\SAS\<projectdir>\Lev1\SASMain\StoredProcessServer\logs

UNIX path: <projectdir>/Lev1/SASMain/StoredProcessServer/logs

It is possible that the log from an unresponsive stored process server will contain no errors. In this case, you should note the last step or program that executed successfully and the last entry written in the log.

Often the last entry in a hung stored process server log file notes that a request has started executing, similar to the example below:

20070223:11.01.32.78: 00000066: 9:sasdemo: STP: 1: Executing c:\mycode stp_report.sas


If servers are unresponsive and you restart object spawner, log entries similar to those below will appear in the logfile when the stored process server attempts to restart. See example below:

20061220:10.18.34.89: 00000061: :: SAS Stored Process Server Version 9.1 ( Build 50 )

20061220:10.18.34.89: 00000061: :: STP: applevel=2
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Session Timeout = 900
20061220:10.18.35.42: 00000061: :: STP: Server Property: Maximum Session Timeout = 3600
20061220:10.18.35.42: 00000061: :: STP: Default Output Encoding Retrieved from NLS
        Locale: wlatin1
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Output Encoding = wlatin1
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Session Cost = 1
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Context Cost = 100
20061220:10.18.35.42: 00000061:ERROR: Bridge protocol engine socket access method failed to
        bind listen socket, error 10048 (The specified address is already in use.).
20061220:10.18.35.42: 00000005: :: STP: Stored Process Server Shutting Down.
20061220:10.18.35.45: 00000005: :: STP: Stored Process Server Shutdown Complete.

NOTE: The SAS System used:
real time 5.18 seconds
cpu time 1.46 seconds


Note: These errors occur because the server is still bound to the designated port that is trying to start. Note that the error messages do not identify the port number already in use, but rather, refer only to the "specified address". If the stored process server log contains the statements above, you need to stop the existing stored process servers and restart the object spawner.

For instructions, see the section Step 5: Stop Existing Stored Process Servers and Restart the Object Spawner.

Checking the Metadata Server Log

Go to the appropriate folder and view the metadata server log file.

Windows path: C:\SAS\<projectdir>\Lev1\SASMain\MetadataServer\logs

UNIX path: <projectdir>/Lev1/SASMain/MetadataServer/logs

Note: Normally, there are no errors logged by the metadata server when stored process servers become unresponsive. However, you should check this log as a precautionary measure.

Checking the Windows Event Viewer for Errors or Clues (Windows OS only)

The Windows Event Viewer is a Windows System Tool found in the Microsoft Management Console. Follow these tasks to check the Windows Event Viewer for errors:

  1. From the Windows Control Panel open Administrative Tools ► Computer Management.
  2. Expand System Tools then expand the Event Viewer tool.
  3. From Event Viewer, click on each log file and in the right-hand pane look for any SAS event or other activity that occurred during the same date and time as the unresponsive condition of the stored process servers.

Step 5: Stop Existing Stored Process Servers and Restart the Object Spawner

  1. Retrieve the listing of process IDs for unresponsive stored process servers that you noted after terminating the object spawner.
  2. Stop the unresponsive processes.

prompt> kill < process ID #>


3. Restart the object spawner.

!SASROOT/BIArch/Lev1/SASMain/ObjectSpawner/ObjectSpawner.sh


From a system prompt, submit

prompt> ObjectSpawner.sh start


4. Using SAS Management Console, test the connection to the stored process server to ensure that the servers were successfully restored.

5. Clean up leftover WORK library files. These files accumulate as a result of abnormal termination of the stored process server sessions.

If the above steps do not restore the SAS Stored Process Servers to normal functionality, please compile results from the tests and the log files referenced in these steps and contact SAS Technical Support for further assistance.

If you are experiencing unresponsive stored process servers using SAS 9.2, please refer to the following notes:

SAS Note 43160, "Tips for addressing unresponsive SAS® 9.2 Stored Process Servers, Part 1"

SAS Note 43163, "Tips for addressing unresponsive SAS® 9.2 Stored Process Servers, Part 2"