Too Many Open Files |
Problem Description
The following two stack traces indicate the same issue and report
the same message, “Too many open files”.
Exception 1
java.net.SocketException: Too many open files at
java.net.PlainSocketImpl.accept(Compiled Code) |
Exception 2
java.io.IOException: Too many open files at
java.lang.UNIXProcess.forkAndExec(Native Method) |
The first exception is thrown when the error affects the underlying TCP
protocol, while the second is thrown when the error affects an I/O
operation.
Both are a consequence of a similar problem blocking the server, which
the following investigative techniques address.
Problem Troubleshooting
Please note that not all of the following items would need to be done.
Some issues can be solved by only following a few of the items.
(DREs: If you have followed all of the items
below and a resolution has not been reached, then the case should be
escalated to a senior engineer for further investigation.)
Quick Links:
Why
does the problem occur?
Monitoring
File Descriptors
Checking
Open Files
How
and when does a file descriptor get released?
Known
WebLogic Server issues
File
Descriptors and Settings
Tips
for FL DREs for their initial response e-mail
These exceptions indicate an Operating System (OS) resource problem and
stems from the OS and JVM process running out of file descriptors
(see What
is a File Descriptor?)
This problem usually occurs after several concurrent users get a
connection to the Server. Java opens many files in order to read in the
classes required to run your application. High volume applications can
use a lot of file descriptors. This could lead to a lack of new file
descriptors. Also, each new socket requires a descriptor. Clients and
Servers communicate via TCP sockets. Each browser's http request
consumes TCP sockets when a connection is established to a Server.
Issues leading to sockets not being correctly closed will eventually
lead to a lack of file descriptors because the file associated with the
socket will not be released unless the socket gets
closed..
It is important to first monitor file descriptors and develop an
understanding of how these diagnostics can inform you about the status
of open files and other potential issues. After stepping through this
troubleshooting section for your operating system, it may then be
necessary to increase the
number of file descriptors (see File
Descriptors and Settings)
(DREs: You need to determine the OS that WLS is running, the max number of file descriptors allowed for a process, and the stack trace of the exception thrown by the Server.)
Identify if the total
amount of file descriptors is too low or if some file descriptors are
not correctly being released. This can be diagnosed by checking at
different periods
the total number of file descriptors to determine whether or not this
number decreases or keeps increasing.
If the number goes down, you should increase the maximum number of file
descriptors to prevent the problem from re-occurring (How
are the number of file descriptors defined on different platforms).
This change can be combined with a reduction of the length of time that
a connection stays in the TIME_WAIT state before being closed (How
and when does a file descriptor get released?). On busy servers the
default value of 240s can delay other connection attempts and therefore
will limit the maximum number of connections.
If it keeps going up, you should identify if some descriptors are
being handled too long (files not being closed correctly - How
and when does a file descriptor get released?) and if too many
files are being created (e.g., where a driver library keeps loading a
file for each new JDBC connection).
Loading jar files can also reduce the number of file descriptors used.
One descriptor is used for a jar, although one descriptor would be used
for each single class if loaded separately.
You can use the following to monitor and diagnose how all the
descriptors are used by one process, depending on your OS:
Among other things, the lsof (LiSt
Open Files) Unix administrative tool (available on
Solaris, Tru64, HP-UX, Linux, and AIX) displays information about open
files and network file descriptors, including their type, size, and
i-node.
For a specific process the syntax is as follows:
lsof -p <pid of process>
The following command
was executed right after starting WLS 8.1SP1 on Solaris 2.7. It shows
that 84 files descriptors were allocated by the Java process (pid 390)
under which the Server is running. This number is far below the default
hard limit of file descriptors.
$ lsof -p 390 | wc -l
84
This command can be executed after the exception occurs to make sure
that the maximum number of open files was reached by this java process.
This will confirm that the process lacks file descriptors.
Then you can run $ lsof -p <pid> and redirect the output
to a file to check each one of the open files. If a file was supposed
to be closed but is present in the list then we will investigate why
this
file was not closed before as expected.
Snippet of output of lsof :
COMMAND PID USER FD TYPE DEVICE SIZE/OFF
NODE NAME |
lsof –h displays all the possible syntax and options. The latest
version of this program can be found at ftp://vic.cc.purdue.edu/pub/tools/unix/lsof
.
A file descriptor will be used for each socket connection and lsof can
also show the type of socket (TCP or UDP) and the listen address and
port (in the name column).
COMMAND PID USER FD TYPE DEVICE SIZE/OFF
NODE
NAME |
On HP you can also use the performance monitoring tool Glance
(available at http://www.hp.com/) to
analyze the
total number of files open when running the WebLogic Server.
If you do not have lsof available, you are also able to view all the
file descriptors for a process in /proc/<pid>/fd. Each
file descriptor lives in this directory.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF
NODE
NAME |
This example shows a socket in a CLOSE_WAIT state. As long as the
socket remains in this state its associated file descriptor will exist.
If many sockets are in this state then the process can run out of file
descriptor. TCP sockets enter in this state when its peer (other side
of
the connection) sent a FIN. No timeout can be set to force the closure
of these sockets so they are just waiting for the local application to
call
the close() function.
On WinNT or Windows
2000 the command-line tool handle reports information
about handles that refer to open files as shown in the following
example. It can
be used for a specific process.
This tool is available at http://www.sysinternals.com/ntw2k/freeware/handle.shtml.
C:\tmp>ps -ef | grep java |
C:\tmp>handle -p java |
This shows that 65 file handles were used on Windows when WLS 8.1SP2
was running.
Another tool for Windows, Process Explorer, is a more sophisticated utility to monitor file handles. It has a GUI interface and displays more information about each running process. You can use this program to search for a particular handle. This tool is available at http://www.sysinternals.com/ntw2k/freeware/procexp.shtml,. Following is a screen shot with example output:
This shows that 884 handles were used by the java process under which
WLS was running, of which just of few (65) refer to open files.
By using any of these tools you can determine if a file which is
supposed to be closed is still open. Next, you should check how the
file was closed and how its file descriptor was released, as discussed
below.
File descriptors are
retired when the file is closed or the process terminates. If the
close() system call doesn’t return a failure code, then the
associated file descriptor becomes available for a future open()
call that allocates a file descriptor. When all file descriptors
associated with an open file description have been closed, the open
file description is freed.
You should not rely on the Garbage Collection and the object
finalization to free a non-Java resource such as file descriptor. This
is why the close() call should be used and its output handled
in case an error occurs.
Closed sockets transit to a TIME_WAIT to make sure that all the data
was transmitted during the connection, a final acknowledgment (ACK)
should finalize the data transfer. This state delays the release of the
file descriptor allocated to it. The duration of this TIME_WAIT period
is defined in the
kernel parameter named tcp_time_wait _interval on Unix systems.
On
Windows NT/2k/XP this period is defined in the registry called TcpTimedWaitDelay
in the system key called HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters.
The following URL provides detailed description of Unix kernel parameters http://www.unixadm.net/networking/tune.html
We can also set a limit to the number of open sockets allowed in server at a given point of time. This can be done via the Administration console in the tuning section of a server’s configuration. The property is called Maximum Open Sockets and the attribute is MaxOpenSockCount.
Increasing the number of File Descriptors will usually
fix this kind of problem, but you will also need to make sure that the
WebLogic Server as an application doesn’t use too many files and that
open
files are getting closed properly so that file descriptors are released.
All issues reported to BEA Support have involved a lack of file
descriptors or an overflow of the descriptor table. This problem always
occurs when the OS notifies the java process that no new file
descriptor can be allocated. In this case, you need to increase the
number of fd.
A file descriptor is a handle represented by an unsigned integer used by a process to identify an open file. It is associated with a file object that includes information such as the mode in which the file was opened, its position type, its initial type, and so on. This information is called the context of the file.
The most common ways
for processes to obtain file descriptors are through native subroutines
open or create or through inheritance from a parent
process. This latter way allows the child process equal access to the
files used by the parent process. File descriptors are generally unique
to each process.
When a child process is created with a fork subroutine, the
child
gets a copy of all its parent process’s file descriptors open at the
time
of the fork. The same copy procedure occurs when a process is
duplicated or copied by the fcntl, dup, and dup2 subroutines.
The second exception represents a scenario where the JVM process lacked
file descriptors, although it needed new ones to duplicate parent
process’s file descriptors during the execution of a forkAndExec()
subroutine.
For each process the OS kernel maintains, in the u_block
structure, a file descriptor table where all file descriptors are
indexed.
The limit of file descriptors, as well as the
maximum size that can be allocated to a process, are defined by a
resource limit. These values should be set in accordance with
OS-specific file descriptor values suggested in the WebLogic Server
documentation:
For WLS 8.1: Tuning
Hardware, Operating System, and Network Performance
For WLS 7.0: Tuning
Hardware, Operating System, and Network Performance
For WLS 6.1: Tuning
Hardware, Operating System, and Network Performance
Both Unix and Linux have File Descriptors. The main difference though
is in the setting of the Hard limit value, the default value, and the
configuration procedure of file descriptors.
The maximum number of file descriptors is also called the hard limit.
The soft limit defines how many files a process can open. The soft
limit can be increased but cannot exceed the hard limit.
The /usr/bin/ulimit utility defines the number of file descriptor allowed for a single process. Its maximum value is defined in rlim_fd_max that is set at 1024 by default. Only the root user can modify these kernel values. The default value for the soft limit is 64 or 256 from Solaris 8.
The Admin user can set their file descriptor
limits
in the etc/security/limits.conf configuration file, as shown in
following
example.
soft nofile 1024
hard nofile 4096
A system-wide file descriptor limit can also be set by adding the
following three lines to the /etc/rc.d/rc.local startup
script:
# Increase system-wide file descriptor limit.
echo 4096 > /proc/sys/fs/file-max
echo 16384 > /proc/sys/fs/inode-max
File descriptors are called file handles on Windows OS. On Windows 2000 server, the open file handles limit is set to 16,384. This number can be monitored in the task manager performance summary.
nfile defines the maximum number of open files. This value is usually determined by the formula: ((NPROC*2)+1000) where NPROC is usually: ((MAXUSERS*5)+64). For a MAXUSERS of 400, this works out to 5128. You can usually set it higher. maxfiles is the Soft file limit per process and maxfiles_lim is the Hard file limit per process.
The file descriptors
limit is set in the /etc/security/limits file and its default
value is 2000. This limit can be changed by the ulimit command
or the setrlimit subroutine. The maximum size is defined by the
constant OPEN_MAX.
Important remark: From WebLogic Server 8.1 SP3, the resetFd() method, called in the commEnv.sh script, automatically resets the file descriptor limit setting from unlimited to 1025. This was done to avoid the issue reported in the Change Request #130536. However an issue was found in this script with the maximum number of file descriptors not correctly set. The workaround is to replace the line if [ "${maxfiles}" -lt 1024 ]; by if [ "${maxfiles}" -gt 1024 ];
Tips for FL DREs for their initial response e-mail
· Before asking for the
customer file descriptors hard limit value make sure to check if the
customer provided us with a server log file because the hard limit
value is logged at startup by the server.
· Make sure you know on which OS the customer is encountering this
issue.
· Suggest the customer check how many files were open at the time
the issue occurred (see Resolution steps in the Problem Troubleshooting
section).