Part 1: How to restore unresponsive SAS Stored Process Servers with SAS® 9.1.3
SAS Technical Support has received reports of previously working SAS Stored Process Servers becoming unresponsive over time for unknown reasons. By unresponsive, we mean that the SAS Stored Process Servers are up and running, but no requests from client applications are getting through to the server. These servers might also be referred to as "hung" or "orphaned" SAS processes.
You might have encountered this problem in one or more ways. You might be the end user working with one of the BI client applications, such as SAS® Enterprise Guide®, SAS® Web Report Studio, the SAS® Add-in for Microsoft Office, the SAS® Information Delivery Portal, SAS® Stored Processes web application, or another web-based application. You click on a button expecting a report to be returned, but instead you receive a generic error or Java dump. You might be the systems administrator, who gets a call from the end user and determines that there is no stored process server responding, or has at least narrowed the problem down to involve a SAS server rather than a client.
What to do?
Use the following five-step approach to evaluate the status of and restore your SAS Stored Process Servers:
A basic test of stored process server functionality is available in the SAS Management Console. To conduct this test, follow the steps below:
A box with the message Test Connection Successful! appears if the test was successful.
Customers usually receive the following error in SAS Management Console when their stored process servers are unresponsive:
sam.S251.ex.msg: A problem occurred while connecting to a load balancing spawner. Check the spawner log for more details.
The command line tool NETSTAT (network statistics) displays incoming and outgoing network connections, routing tables, and various network interface statistics. This tool is available on UNIX and Windows operating systems. Use this tool to provide information about the status of the ports on which a particular stored process server runs, which by default are ports 8611, 8621, and 8631.
Follow these steps:
prompt> netstat –ano | find "8611"
Note: substitute "8621" or "8631" in order to check the other ports.
prompt> netstat –an | grep "8611"
Note: substitute "8621" or "8631" in order to check the other ports.
2. Evaluate the port status.
Note: A state of "CLOSE_WAIT" that persists for longer than 2-5 minutes might indicate that the server is hung.
See RFC793 (pages 20 and 21) for details about the progression of states for a TCP/IP connection.
The appearance of NETSTAT output when port 8611 is unresponsive is shown in the example below.
tcp4 1312 0 myserver.na.sas.8611 otherserver.na.sas.56480 CLOSE_WAIT
tcp4 1254 0 myserver.na.sas.8611 otherserver.na.sas.56487 CLOSE_WAIT
tcp4 1300 0 myserver.na.sas.8611 otherserver.na.sas.56498 CLOSE_WAIT
tcp4 0 0 *.8611 *.* LISTEN
tcp4 1280 0 myserver.na.sas.8611 otherserver.na.sas.56805 CLOSE_WAIT
tcp4 1011 0 myserver.na.sas.8611 otherserver.na.sas.56816 CLOSE_WAIT
tcp4 1267 0 myserver.na.sas.8611 otherserver.na.sas.56822 CLOSE_WAIT
tcp4 1234 0 myserver.na.sas.8611 otherserver.na.sas.56825 CLOSE_WAIT
tcp4 1260 0 myserver.na.sas.8611 otherserver.na.sas.56828 CLOSE_WAIT
tcp4 0 0 *.8621 *.* LISTEN
tcp4 0 0 *.8631 *.* LISTEN
tcp4 1299 0 myserver.na.sas.8611 otherserver.na.sas.56850 CLOSE_WAIT
tcp4 1280 0 myserver.na.sas.8611 otherserver.na.sas.56854 CLOSE_WAIT
tcp4 1311 0 myserver.na.sas.8611 otherserver.na.sas.56865 CLOSE_WAIT
tcp4 1269 0 myserver.na.sas.8611 otherserver.na.sas.32775 ESTABLISHED
tcp4 1273 0 myserver.na.sas.8611 otherserver.na.sas.32781 ESTABLISHED
tcp4 1281 0 myserver.na.sas.8611 otherserver.na.sas.57282 CLOSE_WAIT
tcp4 1272 0 myserver.na.sas.8611 otherserver.na.sas.32786 ESTABLISHED
tcp4 1322 0 myserver.na.sas.8611 otherserver.na.sas.57288 CLOSE_WAIT
tcp4 1284 0 myserver.na.sas.8611 otherserver.na.sas.32795 ESTABLISHED
tcp4 1302 0 myserver.na.sas.8611 otherserver.na.sas.57298 CLOSE_WAIT
tcp4 1296 0 myserver.na.sas.8611 otherserver.na.sas.32804 ESTABLISHED
By default, the SAS Stored Process Server is configured to execute using the sassrv user account.
If execution of the NETSTAT command reveals unresponsive servers, you should terminate the object spawner and search for remaining SAS processes that are owned by the sassrv user account. Any SAS processes owned by the sassrv account that persist after the Object Spawner shuts down are likely to be hung SAS Stored Process Servers (although other possible explanations exist).
Follow these steps:
Stop the object spawner service through the Windows Services Manager.
SASROOT/BIArch/Lev1/SASMain/ObjectSpawner/ObjectSpawner.sh
From a system prompt, submit the following:
prompt> ObjectSpawner.sh stop
2. Search for remaining SAS processes that still persist after you terminate the object spawner and are owned by the sassrv user account (or equivalent at your site):
From the Processes tab, sort the process list by the User Name column and look for processes with an associated user name of sassrv.
prompt> ps-ef | grep "sassrv"
prompt> ps –ef | grep "sassrv" | grep "8611"
prompt> ps –ef | grep "sassrv" | grep "8621"
prompt> ps –ef | grep "sassrv" | grep "8631"
prompt> ps –ef | grep "sassrv" | grep "sasexe/sas"
prompt> ps –ef | grep "sassrv"
When stored process servers become unresponsive, error and warning messages might appear in log files for the object spawner, stored process servers, metadata servers, and (for Windows systems only) the Windows Event Viewer.
You should review all the log file types noted below.
Caution: In order to preserve these log files and document specific incidents, it is critical that you copy the Object Spawner logs to backup files before restarting the Object Spawner. The existing log files will be overwritten when Object Spawner starts.
Note: Use the SAS BI Color Coding and Reporting Tools to identify errors or warnings in the object spawner, stored process server, or metadata server log files. More information about these tools is available in SAS Note 19889, "SAS® Business Intelligence Color Coding and Reporting Tools are available on the Download site."
Go to the appropriate folder and view the object spawner log file.
Windows path: C:\SAS\<projectdir>\Lev1\SASMain\ObjectSpawner\logs
Windows file name: objspawn.log
UNIX path: <projectdir>/Lev1/SASMain/ObjectSpawner/logs
UNIX file name: objspawn_console.log
UNIX file name: objspawn.log
Note: After a stored process server becomes unresponsive, you will find one or more of the following messages written to the object spawner log file named objspawn.log:
ERROR: The tcpSockWriteVector call failed. The system error is 'Broken pipe'.
ERROR: Bridge protocol engine socket access method failed to send vector to socket, error 32 (Broken pipe).
ERROR: The Balance algorithm timed out before a server could be found.
WARNING: The load balancing instance ldblCompRefDirectConnection call failed.
WARNING: Unable to redirect the client request.
Go to the appropriate folder and view the stored process server log file.
Windows path: C:\SAS\<projectdir>\Lev1\SASMain\StoredProcessServer\logs
UNIX path: <projectdir>/Lev1/SASMain/StoredProcessServer/logs
It is possible that the log from an unresponsive stored process server will contain no errors. In this case, you should note the last step or program that executed successfully and the last entry written in the log.
Often the last entry in a hung stored process server log file notes that a request has started executing, similar to the example below:
20070223:11.01.32.78: 00000066: 9:sasdemo: STP: 1: Executing c:\mycode stp_report.sas
If servers are unresponsive and you restart object spawner, log entries similar to those below will appear in the logfile when the stored process server attempts to restart. See example below:
20061220:10.18.34.89: 00000061: :: SAS Stored Process Server Version 9.1 ( Build 50 )
20061220:10.18.34.89: 00000061: :: STP: applevel=2
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Session Timeout = 900
20061220:10.18.35.42: 00000061: :: STP: Server Property: Maximum Session Timeout = 3600
20061220:10.18.35.42: 00000061: :: STP: Default Output Encoding Retrieved from NLS
Locale: wlatin1
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Output Encoding = wlatin1
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Session Cost = 1
20061220:10.18.35.42: 00000061: :: STP: Server Property: Default Context Cost = 100
20061220:10.18.35.42: 00000061:ERROR: Bridge protocol engine socket access method failed to
bind listen socket, error 10048 (The specified address is already in use.).
20061220:10.18.35.42: 00000005: :: STP: Stored Process Server Shutting Down.
20061220:10.18.35.45: 00000005: :: STP: Stored Process Server Shutdown Complete.
NOTE: The SAS System used:
real time 5.18 seconds
cpu time 1.46 seconds
Note: These errors occur because the server is still bound to the designated port that is trying to start. Note that the error messages do not identify the port number already in use, but rather, refer only to the "specified address". If the stored process server log contains the statements above, you need to stop the existing stored process servers and restart the object spawner.
For instructions, see the section Step 5: Stop Existing Stored Process Servers and Restart the Object Spawner.
Go to the appropriate folder and view the metadata server log file.
Windows path: C:\SAS\<projectdir>\Lev1\SASMain\MetadataServer\logs
UNIX path: <projectdir>/Lev1/SASMain/MetadataServer/logs
Note: Normally, there are no errors logged by the metadata server when stored process servers become unresponsive. However, you should check this log as a precautionary measure.
Checking the Windows Event Viewer for Errors or Clues (Windows OS only)
The Windows Event Viewer is a Windows System Tool found in the Microsoft Management Console. Follow these tasks to check the Windows Event Viewer for errors:
prompt> kill < process ID #>
3. Restart the object spawner.
!SASROOT/BIArch/Lev1/SASMain/ObjectSpawner/ObjectSpawner.sh
From a system prompt, submit
prompt> ObjectSpawner.sh start
4. Using SAS Management Console, test the connection to the stored process server to ensure that the servers were successfully restored.
5. Clean up leftover WORK library files. These files accumulate as a result of abnormal termination of the stored process server sessions.
If the above steps do not restore the SAS Stored Process Servers to normal functionality, please compile results from the tests and the log files referenced in these steps and contact SAS Technical Support for further assistance.
If you are experiencing unresponsive stored process servers using SAS 9.2, please refer to the following notes:
SAS Note 43160, "Tips for addressing unresponsive SAS® 9.2 Stored Process Servers, Part 1"
SAS Note 43163, "Tips for addressing unresponsive SAS® 9.2 Stored Process Servers, Part 2"