Skip to main content

Solaris 11.3 Bug# 21207532 System or DB hang in zil_commit() at shutdown


The issue is generated due to ZFS bug present in version lower than 11.3.2.4.0, those symptoms produce that some errors related with IO wait are generated in the trace log file of the Oracle Data Base, shutting down the mmon process as you see below


Trace log example:

Thu Sep 01 08:51:11 2016
WARNING: aiowait timed out 2 times
Thu Sep 01 08:57:38 2016
minact-scn: got error during useg scan e:12751 usn:4
minact-scn: useg scan erroring out with error e:12751
Suspending MMON action 'Block Cleanout Optim, Undo Segment Scan' for 82800 seconds
Thu Sep 01 09:01:11 2016
WARNING: aiowait timed out 3 times
Thu Sep 01 09:07:20 2016
Suspending MMON action 'undo usage' for 82800 seconds
Thu Sep 01 09:11:11 2016
WARNING: aiowait timed out 4 times
Thu Sep 01 09:12:03 2016
Shutting down instance (immediate)
Stopping background process SMCO
Thu Sep 01 09:12:34 2016
Background process SMCO not dead after 30 seconds
Killing background process SMCO
Shutting down instance: further logons disabled


You can not shutdown normally and you will need to perform a shutdown abort and when you try to start the Oracle Data Base again, you will receive the same error in the screen


ORA-01102: cannot mount database in EXCLUSIVE mode 

SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area 4745003008 bytes
Fixed Size 2167200 bytes
Variable Size 2399145568 bytes
Database Buffers 2332033024 bytes
Redo Buffers 11657216 bytes
ORA-01102: cannot mount database in EXCLUSIVE mode  


When you check the server you can not detect any issue related with IO wait

r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 72.0 0.0 191.0 0.0 0.0 0.0 0.4 0 1 c1d3
0.0 67.0 0.0 309.5 0.0 0.0 0.0 0.2 0 0 c1d4
0.0 130.0 0.0 1452.5 0.0 0.0 0.0 0.2 0 1 c1d5
0.0 120.0 0.0 1413.0 0.0 0.0 0.0 0.3 0 1 c1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.0 0.0 20.0 0.0 0.0 0.0 0.4 0 0 c1d4
0.0 1.0 0.0 20.0 0.0 0.0 0.0 0.3 0 0 c1d5
0.0 1.0 0.0 20.0 0.0 0.0 0.0 0.4 0 0 c1d6

The recommendation is take a coredump when the issue is in progress, for my experience it is not permit start the oracle database until the server is booted or you can  apply before   the SRU recommended since 11.3.2.4.0 or greater where the  issue is fixed.

Execute the following command to take the it (coredump), if you need to reboot because you can not apply the SRU at this moment do a "reboot -d"

#savecore -L      (Take the core information without reboot)

We posted for you previously how to update Solaris 11 in the link below, follow the steps in order to do the upgrade

http://unixaddiction.blogspot.com/2016/05/updating-solaris-11x-to-113-and-sru-to.html


 You can open a case in the MOS, in order to be analyzed the coredump generated, they should see something like this

pc: genunix:cv_wait+0x3c: call unix:swtch

void genunix:cv_wait+0x3c((kcondvar_t *)0xc4014b00def6, (kmutex_t *)0xc4014c071b80)
void zfs:zil_commit+0xd4((zilog_t *)0xc4014c071b80, (uint64_t), (uint64_t)0xf7b, (uint64_t)0xf7b)
int zfs:zfs_write+0x910((vnode_t *)0xc401666a8b40, (uio_t *)0x2a10d51b248, (int)0x40, (cred_t *)0xc40146364fd0, (caller_context_t *)0)
int genunix:fop_write+0x84((vnode_t *)0xc401666a8b40, (uio_t *)0x2a10d51b248, (int)0x40, (cred_t *)0xc40146364fd0, (caller_context_t *)0)
ssize_t genunix:pwrite+0x208((int), (void *), (size_t)0x100000?, (off_t))
unix:_syscall_no_proc_exit+0x58()
-- switch to user thread's user stack --

address translation failed for thread.t_prev: 8 bytes @ 0x2a10a995cf8

  1 matching thread found
  with function "zil_commit" in its stack


All values are 0 in the corresponding zl_itxg struct.

CAT(vmcore.1/11V)> sdump 0xc4014c071b80 zilog_t zl_itxg
  itxg_t [4] zl_itxg = [ {
  kmutex_t itxg_lock = {
  void *[1] _opaque = [ NULL ]
  }


SOLUTION OR WORKAROUND  (Doc ID 2116637.1)

Bug 21207532 is fixed in Solaris 11.3 SRU 2.4 (or later) 


Comments

Last Week Topics

How to Force The Database Open With `_ALLOW_RESETLOGS_CORRUPTION

This is an internal note from Oracle. Forcing The Database Open With `_ALLOW_RESETLOGS_CORRUPTION` with Automatic Undo Management ( Doc ID 283945.1 ) Warning The following instructions should only be used under the explicit direction of Oracle Support. These steps should only be used when all other conventional means of recovering the database have failed. Please note that there is no guarantee that this method will succeed. IF THE STEPS BELOW DO ALLOW YOU TO OPEN YOUR DATABASE THEN IT IS ESSENTIAL THAT THE DATABASE BE REBUILT AS IT IS NO LONGER SUPPORTED. FAILURE TO DO SO MAY LEAD TO DATA DICTIONARY INCONSISTENCIES, INTERNAL ERRORS AND CORRUPTIONS. ** Note: The steps here apply to Oracle 9i or higher and only and when Automatic Undo Management is being used. ** Steps to attempt to force the database open: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) Backup the database while the database is closed. THE INSTRUCTIONS HERE ARE DESTRUCTIVE. YOU ARE STRONGLY A

HOW TO SHARE SAMBA SHARE FROM WINDOWS TO SOLARIS 11

SHARE WINDOWS FOLDER WITH SAMBA IN SOLARIS 11 OPEN THOSE PORT IF YOU HAVE A FIREWALL BETWEEN SERVERS PORT    STATE SERVICE 135/tcp open  msrpc 139/tcp open  netbios-ssn 445/tcp open  microsoft-ds 137 UDP 138 UDP INSTALL SAMBA PACKAGES #pkg install samba ENABLE EACH SERVICES AFTER INSTALLING SAMBA AND CONFIGURE THE SMB.CONF #svcadm enable idmap #svcadm enable smb/client #svcadm enable samba root@:/# vi /etc/samba/smb.conf [ftps]   path = //april.domaintest/FTPS   realm = april.domaintest.com   netbios name = april   passdb backend = YourSharingPassword   guest account = SAMBAUX   log file = /var/samba/log/%m.log   load printers = No   wins server = YourWinServer    winbind trusted domains only = No   workgroup = domaintest.com   hosts allow = 192.168.1.10    TEST THE CONNECTION WITH WINDOWS SAMBA SERVER SHARE     root@:/#  smbclient -L //april/FTPS/ -s /etc/samba/smb.conf -N Anonymous login successful         Sharename       Ty

OSWatcher analysis Solaris/Oracle Linux for DataBases

ORACLE DATABASE 10 or greater ORACLE SOLARIS SPARC 1.  As "root" use create  OSWatcher startup/stop script (startOSWbb.sh) # uname -a SunOS solaristest  5.10 Generic_150-400 # pwd /etc/init.d # cat OSW_init.sh  OSW_SRC_DIR= <<<<----- Modify this to reflect your OS Watcher source directory echo $OSW_SRC_DIR echo "******************************************************" >> $OSW_SRC_DIR/init_osw.log case $1 in 'start') echo "...Starting OSWBB from init at `date` " >> $OSWBB_SRC_DIR/init_osw.log cd $OSWBB_SRC_DIR; ./startOSWbb.sh ;; 'stop') echo "...Stopping OSWBB from init at `date` " >> $OSWBB_SRC_DIR/init_osw.log cd $OSW_SRC_DIR; ./stopOSWbb.sh ;; *) echo "Usage: $0 start|stop" >&2 exit 1 ;; esac exit 0   2.  Add "execute" permissions on this script: #chmod +x OSW_init.sh 3.  Create a soft link to this script fr

HOW TO ENABLE A VIRTUAL INTERFACE (VNIC) SOLARIS 10

HOW TO ENABLE A VIRTUAL INTERFACE (VNIC) SOLARIS 10 1.-Verify the interfaces on the server that you need to add the ip in this example 10.1.1.8 # dladm show-phys LINK CLASS MTU STATE OVER bge0 phys 1500 unknown -- bge1 phys 1500 up --    2.-Now you need to create a virtual network interface or VNIC on the server #ifconfig bge1:1 plumb #ifconfig -a 3.-Finally you can add the new ip address and add on the server in /etc/hostname.bge1:1 the IP or the name that you defined on the hosts file with that ip #vi /etc/hostname.bge1:1 10.1.1.8 #ifconfig bge1:1 10.1.1.8 netmask 255.255.255.0 broadcast 10.1.1.254 up Regards Roger    

How to reorganize tables with brspace commands.

Brspace use internally Oracle DBMS_REDEFINITION. If you have SAP with Oracle, this is a very fast way to reorganize object in Oracle Database. In this example we will organize simultaneously S562,MLPPF and MLCRP tables. Important : If you want to reorganize various tables and indexes, these must reside in same tablespace. 1- Tables reorganization. brspace -p /oracle/PRD/102_64/dbs/initPRD.sap -c force -s 20 -l E -f tbreorg -a reorg -s PSAPSR3 -o SAPSR3 -t "S562,MLPPF,MLCRP" -n PSAPSR3 -e 16 -p 16 -m online *  /oracle/PRD/102_64/dbs/initPRD.sap : SAP Parameter file * PSAPSR3 : Source tablespace * SAPSR3  : Table owner * PSAPSR3 :  Destiny tablespace. * -e 16 -p 16 -m :  It indicates how many parallel processes that will perform the operation,in this case are 16. * online : It indicates that the reorganization of the tables will be made ONLINE 2- After tables reorganization you will need to rebuild the S562,MLPPF and MLCRP indexes tables . brspac

How to break a bonded network interface red hat

1.- Bonding device called bond0 which aggregated by eth0 and eth1 # ifconfig bond0     Link encap:Ethernet  HWaddr 44:a8:42:5d:6d:5d           inet addr:192.168.1.51  Bcast:192.168.1.255  Mask:255.255.255.0           inet6 addr: fe80::5054:ff:fe4d:9004/64 Scope:Link           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 eth0      Link encap:Ethernet  HWaddr 44:a8:42:5d:6d:5d           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1 eth2      Link encap:Ethernet  HWaddr 44:a8:42:5d:76:29           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1           RX packets:6 errors:0 dropped:0 overruns:0 frame:0 # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) Bonding Mode: fault-tolerance (active-backup) Primary Slave: em1 (primary_reselect always) Currently Active Slave: em1 MII Status: up MII Polling Interval (ms): 50 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Speed: 10000

How to install Oracle Directory Server 11 Solaris 10

Createl DSCC Registry that is   Directory Server Manager for LDAP server administration root@ldapserv1:/opt/ODSEE_ZIP_Distribution/dsee7/bin# ./dsccsetup ads-create Choose password for Directory Service Manager: Confirm password for Directory Service Manager: Creating DSCC registry... DSCC Registry has been created successfully Deploy the directory server root@ldapserv1:/opt/ODSEE_ZIP_Distribution/dsee7/bin# ./dsccsetup war-file-create Created /opt/ODSEE_ZIP_Distribution/dsee7/var/dscc7.war 1636 /opt/dsInst Choose the Directory Manager password: <Password Directory Manager> Confirm the Directory Manager password: <Password Directory Manager> Starting the instance created with dsadm Use command 'dsadm start '/opt/dsInst'' to start the instance oot@ldapserv1:/opt/ODSEE_ZIP_Distribution/dsee7/bin# ./dsadm start '/opt/dsInst' Directory Server instance '/opt/dsInst' started: pid=19325 Create the suffix   and port that will be used,

HOW TO CHANGE HOSTNAME RED HAT LINUX

HOW TO CHANGE HOSTNAME RED HAT LINUX 1.-Validate Hostname and host file that you need to change #hostname rhel #cat /etc/hosts 127.0.0.1  localhost 192.168.1.13  rhel 2.-Edit the following file in order to change HOSTNAME #vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=TEST GATEWAY=192.168.1.1 3.-When you are ready and you save the information you will need to edit the hosts file #vi /etc/hosts 127.0.0.1  localhost 192.168.1.13  test 4.- Finally you will need to restart de network services #service network restart #hostname test

FAN Fully Automated Nagios monitoring tool

If you need install an application to monitor, with a simple installation and no many configuration or compilation, you can use FAN versiĆ³n 2.1 We have a tool that is integrated with Nagios, Centreon, Nagvis and DocuWiki OS Centos 5.9 Centreon 2.4.1 Nagvis 1.7 Nagios and DocuWiki You can download the ISO and Documentation from the following link, it is a very easy software to install in order to monitor the servers. http://www.fullyautomatednagios.org/