Blog | Dec 1, 2014

A Limited Entropy Pool Can Have an Impact at the OS Level When Using Some Oracle Products

At TriCore, we deal with many clients who work with Oracle products in varying capacities.  We recently ran into a situation where a couple of customers experienced some issues that ultimately were explained by the impact of a limited entropy pool at the OS level.  It was very difficult to identify because no errors were caused by a depleted entropy pool. The issue manifested as a slow but climbing increase in run time for concurrent requests that used the xml publisher Bursting engine. This issue can also impact the performance of adpatch and any other applications code that requires a call to the sun.security.provider.SeedGenerator. We were able to uncover the underlying problem after we noted a 200% improvement in patch runtimes at our clients’ when we implemented the solution presented in this document. Another customer also found this to be the case when they hit the same problem while testing the 11.5.10.2 to R12.1.3 upgrade. They were able to identify and resolve the issue internally when an employee determined that in their case the entropy pool issues were more likely to occur on virtual machines.

Entropy In Linux:
Based on my research it looks like the problem started in the kernel 2.6 releases and may continue to the latest release.  In the 2.6 kernel the default entropy is 4096 bits. Oracle states that the pool should stay above 400 bits. Normally entropy is replenished by the OS using sources of random bits such as mouse, key board and IO from the physical layer. In a VM none of these sources exist for the guest OS.

 

The performance degrades when /dev/random runs out of random bits. Then the application is stuck waiting for more bits to accumulate in the entropy pool. There are two main options for obtaining entropy.

1) /dev/random is categorized as a high quality entropy device and is typically the default.
2) /dev/urandom. uses the entropy pool (/dev/random) as long as it is available, but falls back on pseudo random numeric algorithms when depleted.

Why a system could be running out of entropy?

Your guest OS performs cryptographic operations on ssh challenges, https connections, and the like, so the /dev/random pool gets consumed quite fast. The OS normally feeds that pool with I/O operations coming from disk, network, mouse or keyboard but those actions may not occur often enough to keep up with the demand for random bits. This is a common pattern on virtualized environments or headless boxes where there is no direct access to the random bits created by these physical devices. Therefore implementation of solution to provide random bits to the guest OS is required.

This issue is known to impact the following Oracle products.

     • eBusiness Suite
     • Weblogic
     • Oracle Enterprise Manager
     • SOA
     • JVM of the Database if code uses that JVM

UNIX based operating systems provide entropy by gathering random bits and making them available for use through /dev/random device.

Testing for the problem

This approach is from MOS Note: 1399980.1


$ cat /proc/sys/kernel/random/entropy_avail
6
$ cat /proc/sys/kernel/random/poolsize
4096

Or use watch if you want to display it every second:

1 watch -n 1 cat /proc/sys/kernel/random/entropy_avail

Solution:

From MOS Note: 1615981.1
yum install rng-tools
echo 'EXTRAOPTIONS="-i -o /dev/random -r /dev/urandom -t 10 -W 2048"' > /etc/sysconfig/rngd
chkconfig rngd on
service rngd restart

Or after installing the rng-tools run this as the root user.

rngd -r /dev/urandom -o /dev/random -t 1

This is a script to check that rngd is running and restart it when found down.
In root's crontab
## Monitor rngd process
*/30 * * * * /usr/bin/check_rngd.sh
script for running and automatically restarting entropy generator

Script body:
--------------------check_rngd.sh--------------------------------
#!/bin/bash

LOG="/tmp/check_rngd.log"
DT=`date +"%D %T"`

count=`ps -eaf |grep -i rngd| grep -v grep| grep -v check_rngd.sh`

if [[ -z $count ]];
        then
        echo "$DT rngd process is not running, start it ASAP" >> $LOG
        /sbin/rngd -r /dev/urandom -o /dev/random -t 1
        echo "Random number generator daemon was restarted on `hostname`, please check logs under /var/log/messages" | mail -s "Random number generator daemon was restarted on `hostname`" emailaddress1@somedomain.com emailaddress3@somedomain.com emailaddress3@somedomain.com
else
        echo "$DT rngd process is running" >> $LOG
        echo "============" >> $LOG
fi
--------------------check_rngd.sh--------------------------------

 

I hope you found this blog helpful.

oReferences:

MOS Notes:
1) Java Concurrent Managers Are Running Slower In Newer Servers Due To Lack Of Entropy In The System (Doc ID 1615981.1)
2) EM 11g: the Enterprise Manager Grid Control OMS Fails To Start when the Host Entropy Value is too Low (RHEL5) (Doc ID 1399980.1)
3) How to diagnose a Linux entropy issue on WebLogic Server instances? (Doc ID 1574979.1)
4) Long Delay During Startup of SOA Managed Server (Doc ID 1336411.1)
5) BI Publisher Concurrent Processing Running Slower Regarding The Number Of Concurrent Submissions (Doc ID 1607684.1)