Too Many Open Files


Problem Description


The following two stack traces indicate the same issue and report the same message, “Too many open files”.

Exception 1

java.net.SocketException: Too many open files

at java.net.PlainSocketImpl.accept(Compiled Code)
at java.net.ServerSocket.implAccept(Compiled Code)
at java.net.ServerSocket.accept(Compiled Code)
at weblogic.t3.srvr.ListenThread.run(Compiled Code)


Exception 2

java.io.IOException: Too many open files

at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:54)
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:54)
at java.lang.Runtime.execInternal(Native Method)
at java.lang.Runtime.exec(Runtime.java:551)
at java.lang.Runtime.exec(Runtime.java:477)
at java.lang.Runtime.exec(Runtime.java:443)


The first exception is thrown when the error affects the underlying TCP protocol, while the second is thrown when the error affects an I/O operation.
Both are a consequence of a similar problem blocking the server, which the following investigative techniques address.

Problem Troubleshooting
Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.
(DREs: If you have followed all of the items below and a resolution has not been reached, then the case should be escalated to a senior engineer for further investigation.)

Quick Links:
Why does the problem occur?
Monitoring File Descriptors
Checking Open Files
How and when does a file descriptor get released?
Known WebLogic Server issues
File Descriptors and Settings
Tips for FL DREs for their initial response e-mail

Why does the problem occur?


These exceptions indicate an Operating System (OS) resource problem and stems from the OS and JVM process running out of file descriptors (see What is a File Descriptor?)

This problem usually occurs after several concurrent users get a connection to the Server. Java opens many files in order to read in the classes required to run your application. High volume applications can use a lot of file descriptors. This could lead to a lack of new file descriptors. Also, each new socket requires a descriptor. Clients and Servers communicate via TCP sockets. Each browser's http request consumes TCP sockets when a connection is established to a Server. Issues leading to sockets not being correctly closed will eventually lead to a lack of file descriptors because the file associated with the socket will not be released unless the socket gets closed..

It is important to first monitor file descriptors and develop an understanding of how these diagnostics can inform you about the status of open files and other potential issues. After stepping through this troubleshooting section for your operating system, it may then be necessary to increase the number of file descriptors (see File Descriptors and Settings)

Monitoring File Descriptors

Checklist

(DREs: You need to determine the OS that WLS is running, the max number of file descriptors allowed for a process, and the stack trace of the exception thrown by the Server.)

Resolution steps

Identify if the total amount of file descriptors is too low or if some file descriptors are not correctly being released. This can be diagnosed by checking at different periods the total number of file descriptors to determine whether or not this number decreases or keeps increasing.

If the number goes down, you should increase the maximum number of file descriptors to prevent the problem from re-occurring (How are the number of file descriptors defined on different platforms).
This change can be combined with a reduction of the length of time that a connection stays in the TIME_WAIT state before being closed (How and when does a file descriptor get released?). On busy servers the default value of 240s can delay other connection attempts and therefore will limit the maximum number of connections.

If it keeps going up, you should identify if some descriptors are being handled too long (files not being closed correctly - How and when does a file descriptor get released?) and if too many files are being created (e.g., where a driver library keeps loading a file for each new JDBC connection).

Loading jar files can also reduce the number of file descriptors used. One descriptor is used for a jar, although one descriptor would be used for each single class if loaded separately.

You can use the following to monitor and diagnose how all the descriptors are used by one process, depending on your OS:

Checking open files

Unix platforms

Among other things, the lsof (LiSt Open Files) Unix administrative tool (available on Solaris, Tru64, HP-UX, Linux, and AIX) displays information about open files and network file descriptors, including their type, size, and i-node.

For a specific process the syntax is as follows:

lsof -p <pid of process>

Example 1

The following command was executed right after starting WLS 8.1SP1 on Solaris 2.7. It shows that 84 files descriptors were allocated by the Java process (pid 390) under which the Server is running. This number is far below the default hard limit of file descriptors.

$ lsof -p 390 | wc -l
84

This command can be executed after the exception occurs to make sure that the maximum number of open files was reached by this java process. This will confirm that the process lacks file descriptors.

Then you can run $ lsof -p <pid> and redirect the output to a file to check each one of the open files. If a file was supposed to be closed but is present in the list then we will investigate why this file was not closed before as expected.

Snippet of output of lsof :

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 29733 usera cwd VDIR 176,22 4096 4300274 /home/usera/810/user_projects/mydomain
java 29733 usera txt VREG 176,22 36396 6642305 /home/usera/810/jdk141_02/bin/java
java 29733 usera txt VREG 176,22 1251192 10818087 /home/usera/810/user_projects/mydomain/myserver/.wlnotdelete/extract/myserver_uddi_uddi/jarfiles/_wl_cls_gen.jar
java 29733 usera txt VREG 176,22 511935 10074851 /home/usera/810/user_projects/mydomain/myserver/.wlnotdelete/extract/myserver_uddi_uddi/jarfiles/WEB-INF/lib/jsse39153.jar
java 29733 usera txt VREG 176,22 2305960 6000676 /home/usera/810/user_projects/mydomain/myserver/.internal/uddi.war
java 29733 usera txt VREG 176,22 1227013 1385413 /home/usera/810/weblogic81/common/eval/pointbase/lib/pbserver44.jar
java 29733 usera txt VREG 176,22 653661 69379 /home/usera/810/weblogic81/server/lib/ant/optional.jar



lsof –h displays all the possible syntax and options. The latest version of this program can be found at ftp://vic.cc.purdue.edu/pub/tools/unix/lsof .

A file descriptor will be used for each socket connection and lsof can also show the type of socket (TCP or UDP) and the listen address and port (in the name column).

Example 2

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
in.telnet 29705 root 2u inet 0x30002808fd8 0t76 TCP aaaaabbbb:telnet->abcdef.bea.com:3886 (ESTABLISHED)


On HP you can also use the performance monitoring tool Glance (available at http://www.hp.com/) to analyze the total number of files open when running the WebLogic Server.

If you do not have lsof available, you are also able to view all the file descriptors for a process in /proc/<pid>/fd. Each file descriptor lives in this directory.

Example 3

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 545 weblogic 24u IPv4 0x30002a4cea8 0t0 TCP abcd:7001->xyz. com:12345 (CLOSE_WAIT)


This example shows a socket in a CLOSE_WAIT state. As long as the socket remains in this state its associated file descriptor will exist. If many sockets are in this state then the process can run out of file descriptor. TCP sockets enter in this state when its peer (other side of the connection) sent a FIN. No timeout can be set to force the closure of these sockets so they are just waiting for the local application to call the close() function.

Windows platforms

Handle

On WinNT or Windows 2000 the command-line tool handle reports information about handles that refer to open files as shown in the following example. It can be used for a specific process.
This tool is available at http://www.sysinternals.com/ntw2k/freeware/handle.shtml.

C:\tmp>ps -ef | grep java
usera 1656 1428 0 10:11:41 CONIN$ 0:46 c:\Releases\WLS8.2\JDK141~1\bin\java -client -Xms32m -Xmx200m -XX:MaxPermSize=128m -Xverify:none -Dweblogic.Name=myserver -Dweblogic.ProductionModeEnabled= -Djava.security.policy="c:\Releases\WLS8.2\WEBLOG~1\server\lib\weblogic.policy" weblogic.Server

C:\tmp>handle -p java

Handle v2.10
Copyright (C) 1997-2003 Mark Russinovich
Sysinternals - www.sysinternals.com

------------------------------------------------------------------------------
java.exe pid: 1656 ABCDEF\usera
18: File C:\Releases\WLS8.2\user_projects\domains\mydomain
170: File C:\Releases\WLS8.2\jdk141_05\jre\lib\rt.jar
178: File C:\Releases\WLS8.2\jdk141_05\jre\lib\sunrsasign.jar
180: File C:\Releases\WLS8.2\jdk141_05\jre\lib\jsse.jar
188: File C:\Releases\WLS8.2\jdk141_05\jre\lib\jce.jar
190: File C:\Releases\WLS8.2\jdk141_05\jre\lib\charsets.jar
328: File C:\Releases\WLS8.2\jdk141_05\jre\lib\ext\dnsns.jar
330: File C:\Releases\WLS8.2\jdk141_05\jre\lib\ext\ldapsec.jar
338: File C:\Releases\WLS8.2\jdk141_05\jre\lib\ext\localedata.jar
340: File C:\Releases\WLS8.2\jdk141_05\jre\lib\ext\sunjce_provider.jar
348: File C:\Releases\WLS8.2\jdk141_05\lib\tools.jar
350: File C:\Releases\WLS8.2\weblogic81\server\lib\weblogic.jar
358: File C:\Releases\WLS8.2\weblogic81\server\lib\jconn2.jar
360: File C:\Releases\WLS8.2\weblogic81\server\lib\ojdbc14.jar
368: File C:\Releases\WLS8.2\weblogic81\server\lib\xmlx.jar
370: File C:\Releases\WLS8.2\weblogic81\server\lib\webservices.jar
378: File C:\Releases\WLS8.2\weblogic81\server\lib\wlcipher.jar
3e0: File C:\Releases\WLS8.2\weblogic81\server\lib\ant\ant.jar
3e8: File C:\Releases\WLS8.2\weblogic81\server\lib\EccpressoJcae.jar
3f0: File C:\Releases\WLS8.2\weblogic81\server\lib\EccpressoCore.jar
3f8: File C:\Releases\WLS8.2\weblogic81\server\lib\EccpressoAsn1.jar
400: File C:\Releases\WLS8.2\weblogic81\server\lib\jConnect.jar
408: File C:\Releases\WLS8.2\weblogic81\server\lib\ant\optional.jar
410: File C:\Releases\WLS8.2\weblogic81\server\lib\ant\jakarta-oro-2.0.7.jar
…..
C:\tmp>handle -p java | wc -l
65




This shows that 65 file handles were used on Windows when WLS 8.1SP2 was running.

Process Explorer

Another tool for Windows, Process Explorer, is a more sophisticated utility to monitor file handles. It has a GUI interface and displays more information about each running process. You can use this program to search for a particular handle. This tool is available at http://www.sysinternals.com/ntw2k/freeware/procexp.shtml,. Following is a screen shot with example output:

ProcessExplorer



This shows that 884 handles were used by the java process under which WLS was running, of which just of few (65) refer to open files.

By using any of these tools you can determine if a file which is supposed to be closed is still open. Next, you should check how the file was closed and how its file descriptor was released, as discussed below.

How and when does a file descriptor get released?

File descriptors are retired when the file is closed or the process terminates. If the close() system call doesn’t return a failure code, then the associated file descriptor becomes available for a future open() call that allocates a file descriptor. When all file descriptors associated with an open file description have been closed, the open file description is freed.

You should not rely on the Garbage Collection and the object finalization to free a non-Java resource such as file descriptor. This is why the close() call should be used and its output handled in case an error occurs.

Closed sockets transit to a TIME_WAIT to make sure that all the data was transmitted during the connection, a final acknowledgment (ACK) should finalize the data transfer. This state delays the release of the file descriptor allocated to it. The duration of this TIME_WAIT period is defined in the kernel parameter named tcp_time_wait _interval on Unix systems. On Windows NT/2k/XP this period is defined in the registry called TcpTimedWaitDelay in the system key called HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters.

The following URL provides detailed description of Unix kernel parameters http://www.unixadm.net/networking/tune.html

We can also set a limit to the number of open sockets allowed in server at a given point of time. This can be done via the Administration console in the tuning section of a server’s configuration. The property is called Maximum Open Sockets and the attribute is MaxOpenSockCount.

Known WebLogic Server Issues


Increasing the number of File Descriptors will usually fix this kind of problem, but you will also need to make sure that the WebLogic Server as an application doesn’t use too many files and that open files are getting closed properly so that file descriptors are released.

All issues reported to BEA Support have involved a lack of file descriptors or an overflow of the descriptor table. This problem always occurs when the OS notifies the java process that no new file descriptor can be allocated. In this case, you need to increase the number of fd.

File Descriptors and Settings

What is a File Descriptor?

A file descriptor is a handle represented by an unsigned integer used by a process to identify an open file. It is associated with a file object that includes information such as the mode in which the file was opened, its position type, its initial type, and so on. This information is called the context of the file.

How does a File Descriptor get created?

The most common ways for processes to obtain file descriptors are through native subroutines open or create or through inheritance from a parent process. This latter way allows the child process equal access to the files used by the parent process. File descriptors are generally unique to each process. When a child process is created with a fork subroutine, the child gets a copy of all its parent process’s file descriptors open at the time of the fork. The same copy procedure occurs when a process is duplicated or copied by the fcntl, dup, and dup2 subroutines.
The second exception represents a scenario where the JVM process lacked file descriptors, although it needed new ones to duplicate parent process’s file descriptors during the execution of a forkAndExec() subroutine.
For each process the OS kernel maintains, in the u_block structure, a file descriptor table where all file descriptors are indexed.

How are the number of file descriptors defined on different platforms?

The limit of file descriptors, as well as the maximum size that can be allocated to a process, are defined by a resource limit. These values should be set in accordance with OS-specific file descriptor values suggested in the WebLogic Server documentation:

For WLS 8.1: Tuning Hardware, Operating System, and Network Performance
For WLS 7.0: Tuning Hardware, Operating System, and Network Performance
For WLS 6.1: Tuning Hardware, Operating System, and Network Performance
Both Unix and Linux have File Descriptors. The main difference though is in the setting of the Hard limit value, the default value, and the configuration procedure of file descriptors.

The maximum number of file descriptors is also called the hard limit. The soft limit defines how many files a process can open. The soft limit can be increased but cannot exceed the hard limit.

Solaris

The /usr/bin/ulimit utility defines the number of file descriptor allowed for a single process. Its maximum value is defined in rlim_fd_max that is set at 1024 by default. Only the root user can modify these kernel values. The default value for the soft limit is 64 or 256 from Solaris 8.

Linux

The Admin user can set their file descriptor limits in the etc/security/limits.conf configuration file, as shown in following example.

soft nofile 1024
hard nofile 4096

A system-wide file descriptor limit can also be set by adding the following three lines to the /etc/rc.d/rc.local startup script:

# Increase system-wide file descriptor limit.
echo 4096 > /proc/sys/fs/file-max
echo 16384 > /proc/sys/fs/inode-max

Windows

File descriptors are called file handles on Windows OS. On Windows 2000 server, the open file handles limit is set to 16,384. This number can be monitored in the task manager performance summary.

HP-UX

nfile defines the maximum number of open files. This value is usually determined by the formula: ((NPROC*2)+1000) where NPROC is usually: ((MAXUSERS*5)+64). For a MAXUSERS of 400, this works out to 5128. You can usually set it higher. maxfiles is the Soft file limit per process and maxfiles_lim is the Hard file limit per process.

AIX

The file descriptors limit is set in the /etc/security/limits file and its default value is 2000. This limit can be changed by the ulimit command or the setrlimit subroutine. The maximum size is defined by the constant OPEN_MAX.

Important remark:  From WebLogic Server 8.1 SP3, the resetFd() method, called in the commEnv.sh script, automatically resets the file descriptor limit setting from unlimited to 1025.  This was done to avoid the issue reported in the Change Request #130536.  However an issue was found in this script with the maximum number of file descriptors not correctly set.  The workaround is to replace the line if [ "${maxfiles}" -lt 1024 ]; by if [ "${maxfiles}" -gt 1024 ];


Tips for FL DREs for their initial response e-mail

· Before asking for the customer file descriptors hard limit value make sure to check if the customer provided us with a server log file because the hard limit value is logged at startup by the server.
· Make sure you know on which OS the customer is encountering this issue.
· Suggest the customer check how many files were open at the time the issue occurred (see Resolution steps in the Problem Troubleshooting section).