Fixing Kernel Panic in Solaris 11.4

The problem

The below issues happened on Solaris 11.4 virtualized on VirtualBox 6 using a x86 CPU due to adding some packages to the kernel after install Oracle database and also has been seen after the installation of GUI component of it, the symptoms are intermittent system reboots or kernel panics

Tracing down the problem

you can go to the below file to look for the first indication of the problem

/var/adm/messages
DESC: The system has rebooted after a kernel panic. The following are potential bugs.

Jun  5 02:11:30 solar stack[0] - 27580105

Jun  5 02:11:30 solar AUTO-RESPONSE: The failed system image was dumped to the dump device.  If savecore is enabled (see dumpadm(8)) a copy of the dump will be written to the savecore directory /var/crash/data/214dcbc0-1786-4162-97d5-e3d1b8b74960.

Jun  5 02:11:30 solar IMPACT: There may be some performance impact while the crash dump is copied to the savecore directory.  Disk space usage by crash dumps can be substantial.

Jun  5 02:11:30 solar REC-ACTION: If savecore is not enabled then please take steps to preserve the crash image. Use 'fmdump -Vp -u 214dcbc0-1786-4162-97d5-e3d1b8b74960' to view more panic detail. Please refer to the associated reference document at http://support.oracle.com/msg/SUNOS-8000-KL for the latest service procedures and policies regarding this diagnosis.

The behavior seen is intermittent system reboots, below the steps that can be used to track down the error

-bash-4.4# strings vmcore.4 | grep xc_serv

xc_serv() timeout on master CPU3.  1 slave CPU(s) not responding to the xcall.

xc_serv_timeout

xc_serv

xc_serv_delay

xc_serv

xc_serv_delay

xc_serv() timeout on master CPU%d.  %d slave CPU(s) not responding to the xcall.

xc_serv() timeout on master CPU3.  1 slave CPU(s) not responding to the xcall.

xc_serv_timeout

xc_serv

xc_serv_delay

xc_serv

xc_serv_delay

xc_serv() timeout on master CPU%d.  %d slave CPU(s) not responding to the xcall.

xc_serv_timeout

xc_serv

xc_serv_delay

genunix: [ID 655072 FACILITY_AND_PRIORITY] ffffe33000131160 unix:xc_serv_delay+84 ()

genunix: [ID 655072 FACILITY_AND_PRIORITY] ffffe33000131200 unix:xc_serv+1f8 ()

genunix: [ID 655072 kern.notice] ffffe33000131200 unix:xc_serv+1f8 ()

genunix: [ID 655072 kern.notice] ffffe33000131160 unix:xc_serv_delay+84 ()

xc_serv() timeout on master CPU3.  1 slave CPU(s) not responding to the xcall.

This is caused due to a cross call (xcall) timeout due to CPU communication for synchronization purposes through the cross-call (xcall) mechanism between the host CPUs and the vCPUs

Another symptoms

Another way to see if you are being affected is by sudden reboots during the machine boot screen, to see that, during the boot load screen hit enter twice and you will see the loading screen in text mode, if you see something like below, the your problem is easy to fix.

The solution

The solution is to increase the timeout of the parameter xc_delay_max in /etc/system and reboot

thanks to my colleague @bymimo for the help on this

2 thoughts on “Fixing Kernel Panic in Solaris 11.4

Leave a Reply

Your email address will not be published. Required fields are marked *

Verified by MonsterInsights