NMS has disappeared

Added by Shane the sysadmin about 1 year ago

So NMS has been crashing repeatedly. It would often take 2 minutes to get anywhere using the web interface and commandline wasn't any better.

First thing to try with anything is updating to the latest version so you're using all the latest bug fixes and go from there. So I tried upgrading the entire appliance but that failed. Then I tried updating just NMS.

It doesn't exist any longer.

# svcs -a | grep nm
disabled       15:20:23 svc:/network/device-discovery/printers:snmp
disabled       15:20:23 svc:/network/snmpd:default
online         15:21:16 svc:/application/nmdtrace:default
offline        15:20:22 svc:/application/nmv:default
offline        15:20:22 svc:/application/nmcd:default
# svcs svc:/application/nms:default
svcs: Pattern 'svc:/application/nms:default' doesn't match any instances
STATE          STIME    FMRI

I'd love to tell you which version of NexentaStor Community Edition the machine is running but that requires the NMS to be working. Closest I can figure is 3.0.something ....

Hardware:

Sun X4100
Twin quad core AMD's
32GB RAM
OS is on mirrored 2x 132GB 2.5" drives
There's a SAS HBA card in it that connects the thing to a Sun J4200 JBOD with 12x 500GB drives

Any ideas other than nuking it and starting again (that would be bad)??


Replies

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Nobody has any ideas at all?? :(

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

sorry this moved past me :)

so we should be able to see what version you are running by in nmc

show appliance version

i would definately run upgrade though. i have seen this problem in the past but haven't for awhile..

setup appliance upgrade

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

No problem Linda. The machine is running fine and hasn't affected anyone or any other machines so as long as that continues fixing the NMS isn't time critical.

Finding the version and running upgrade could be a problem though. Not sure what sysevent has to do with anything but the other two require NMS running.

shane@womble.office[10:57:53]:~$ ssh root@monster1.office
Password: 
Last login: Thu Apr  5 10:49:29 2012 from womble.office

                          * * *
                      SYSTEM NOTICE

     Failed to initialize NMC:
     The name com.nexenta.nms was not provided by any .service files

     Suggested possible recovery actions:
        - Reboot into a known working system checkpoint
        - Run 'svcadm clear nms'; then try to re-login
     Suggested troubleshooting actions:
        - Run 'svcs -vx' and collect output for further analysis
        - Run 'dmesg' and look for error messages
        - View "/var/log/nms.log" for error messages
        - View "/var/svc/log/application-nms:default.log" for error messages

Entering UNIX shell. Type 'exit' to go back to NMC login...
root@monster1:~# svcs -xv
svc:/system/sysevent:default (system event notification)
 State: maintenance since Mon Apr 02 09:28:04 2012
Reason: Restarting too quickly.
   See: http://sun.com/msg/SMF-8000-L5
   See: man -M /usr/share/man -s 1M syseventd
   See: /var/svc/log/system-sysevent:default.log
Impact: 1 dependent service is not running:
        svc:/system/fmd:default

svc:/application/nmcd:default (Nexenta Management Console Daemon)
 State: offline since Mon Apr 02 09:27:50 2012
Reason: Dependency svc:/application/nms:default is absent.
   See: http://sun.com/msg/SMF-8000-E2
   See: man -M /usr/share/man -s 1 nmcd
Impact: This service is not running.

svc:/application/nmv:default (Nexenta Management Views - all in one management GUI)
 State: offline since Mon Apr 02 09:27:50 2012
Reason: Dependency svc:/application/nms:default is absent.
   See: http://sun.com/msg/SMF-8000-E2
   See: man -M /usr/share/man -s 1 nmv
Impact: This service is not running.
root@monster1:~# 

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

So none of this will effect services(nfs, iscsi, cifs) from being served out, but syseventd is a core service and nms is dependent on it.

I have seen this problem a number of times and know the the patch for it is in the repo. If you can get downtime in the future to get the upgrade.

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Yeah that's not a problem, I just need to notify people that it and some virtual machines are going down on a weekend.

Question is, how am I going to upgrade it without the NMS and it's dependents running?? Will it be an apt-get upgrade or something??

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Weekend soon (we get the weekend before everyone else in the world) so I can have a go at fixing this. What do you suggest I do to sort it??

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

I would reboot and when nms comes up you should be able to run an upgrade.

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

The service doesn't exist anymore so rebooting won't bring it up (I know this because I tried it already)

root@monster1:~# svcs -a | grep nm
disabled       Apr_02   svc:/network/device-discovery/printers:snmp
disabled       Apr_02   svc:/network/snmpd:default
online         Apr_02   svc:/application/nmdtrace:default
offline        Apr_02   svc:/application/nmv:default
offline        Apr_02   svc:/application/nmcd:default

Before I can do an upgrade the nms service has to be reinstalled.

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

can you try a svcadm enable nms?

do you have a support contract? i would really like to have support take a look at that.. Can you get to the nms.log? This also looks like production and they can help walk you through the procedures.

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

so let me have you try a few things

what does this say?

ps -ef | grep nm

nms's log file should be

/var/svc/log/application-nms:default.log

can i see that?

what does this say

svcs -x

can i also see

svcs -l nbs

and the contents of

/var/svc/log/system-sysevent:default.log

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

one more what does this show?

svcadm clear nms

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Ok. I've had a look and all of the files for the nms appear to be present and correct, it's just the service has been uninstalled. So I reinstalled the service

root@monster1:~# svccfg import /var/svc/manifest/application/nms.xml

Kicked it out of maintenance state but it went straight back into it.

root@monster1:~# svcadm clear svc:/application/nms:default
root@monster1:~# svcs -xv
svc:/application/nms:default (Nexenta Management Services and API daemon)
 State: maintenance since Thu Apr 12 09:45:14 2012
Reason: Start method failed repeatedly, last exited with status 146.
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1 nms
   See: /var/svc/log/application-nms:default.log
Impact: 2 dependent services are not running:
        svc:/application/nmcd:default
        svc:/application/nmv:default

svc:/system/sysevent:default (system event notification)
 State: maintenance since Thu Apr 12 09:44:26 2012
Reason: Restarting too quickly.
   See: http://sun.com/msg/SMF-8000-L5
   See: man -M /usr/share/man -s 1M syseventd
   See: /var/svc/log/system-sysevent:default.log
Impact: This service is not running.

Output from ps -ef

root@monster1:~# ps -ef | grep nm
    root   593     1   0   Apr 02 ?           3:54 /usr/bin/perl /lib/svc/method/nmdtrace -d
    root   471     1   0   Apr 02 ?           0:00 dbus-daemon --fork --config-file=/var/lib/nza/nmdtrace.conf --print-address
    root  3529  3521   0   Apr 05 pts/3       0:02 /usr/bin/perl /usr/bin/nmc
    root  9052  9044   0 08:43:43 pts/4       0:02 /usr/bin/perl /usr/bin/nmc
    root 12962  3536   0 13:58:36 pts/3       0:00 grep nm
root@monster1:~# svcs -l nbs
fmri         svc:/application/nbs:default
name         Nexenta Boot Services
enabled      true
state        online
next_state   none
state_time   Mon Apr 02 09:28:35 2012
logfile      /var/svc/log/application-nbs:default.log
restarter    svc:/system/svc/restarter:default
contract_id  73 
dependency   require_all/restart svc:/system/dbus:default (online)

The contents of /var/svc/log/application-nms:default.log

[ Mar 31 15:09:24 Method "start" exited with status 0. ]
[ Mar 31 15:09:24 Stopping because all processes in service exited. ]
[ Mar 31 15:09:24 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Mar 31 15:09:25 Method "stop" exited with status 0. ]
[ Mar 31 15:09:25 Restarting too quickly, changing state to maintenance. ]
[ Mar 31 15:10:04 Leaving maintenance because clear requested. ]
[ Mar 31 15:10:04 Enabled. ]
[ Mar 31 15:10:04 Disabled. ]
[ Mar 31 15:13:47 Enabled. ]
[ Mar 31 15:13:47 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Mar 31 15:20:22 Enabled. ]
[ Mar 31 15:21:10 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Mar 31 15:33:49 Method "start" failed due to signal KILL. ]
[ Mar 31 15:33:49 Leaving maintenance because disable requested. ]
[ Mar 31 15:33:49 Disabled. ]
[ Apr 12 08:52:15 Enabled. ]
[ Apr 12 08:52:15 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Apr 12 09:07:33 Method "start" exited with status 0. ]
[ Apr 12 09:07:33 Stopping because all processes in service exited. ]
[ Apr 12 09:07:33 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:07:35 Method "stop" exited with status 0. ]
[ Apr 12 09:07:35 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Apr 12 09:17:55 Method "start" exited with status 0. ]
[ Apr 12 09:17:55 Stopping because service restarting. ]
[ Apr 12 09:17:55 Executing stop method ("/lib/svc/method/nms stop"). ]
Stopping NMS daemon (10083) ...
[ Apr 12 09:18:01 Method "stop" exited with status 0. ]
[ Apr 12 09:18:01 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Apr 12 09:25:20 Method "start" exited with status 0. ]
[ Apr 12 09:33:58 Stopping because all processes in service exited. ]
[ Apr 12 09:33:58 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:33:59 Method "stop" exited with status 0. ]
[ Apr 12 09:33:59 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Apr 12 09:35:40 Method "start" exited with status 0. ]
[ Apr 12 09:35:40 Stopping because all processes in service exited. ]
[ Apr 12 09:35:40 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:35:42 Method "stop" exited with status 0. ]
[ Apr 12 09:35:42 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Apr 12 09:36:43 Method "start" exited with status 0. ]
[ Apr 12 09:36:43 Stopping because all processes in service exited. ]
[ Apr 12 09:36:43 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:36:44 Method "stop" exited with status 0. ]
[ Apr 12 09:36:44 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
     1. Logical Node: /dev/rdsk/c0t0d0p0
        Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
        Connected Device: HL-DT-ST DVDRAM GE20LU10  FE06
        Device Type: DVD Reader/Writer
        Bus: USB
        Size: 
        Label: 
        Access permissions: 
[ Apr 12 09:37:13 Method "start" exited with status 0. ]
[ Apr 12 09:37:13 Stopping because all processes in service exited. ]
[ Apr 12 09:37:13 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:37:15 Method "stop" exited with status 0. ]
[ Apr 12 09:37:15 Restarting too quickly, changing state to maintenance. ]
[ Apr 12 09:37:15 Leaving maintenance because disable requested. ]
[ Apr 12 09:37:15 Disabled. ]
[ Apr 12 09:44:14 Enabled. ]
[ Apr 12 09:44:14 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
        org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
 at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
        Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
        Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
        NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:44:15 Method "start" exited with status 146. ]
[ Apr 12 09:44:15 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
        org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
 at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
        Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
        Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
        NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:44:17 Method "start" exited with status 146. ]
[ Apr 12 09:44:17 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
        org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
 at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
        Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
        Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
        NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:44:19 Method "start" exited with status 146. ]
[ Apr 12 09:45:12 Leaving maintenance because clear requested. ]
[ Apr 12 09:45:12 Enabled. ]
[ Apr 12 09:45:12 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
        org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
 at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
        Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
        Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
        NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:45:14 Method "start" exited with status 146. ]

The contents of /var/svc/log/system-sysevent:default.log (chopped out part cos it's huge but I can put that bit back in. It's just repeating over and over again though)

[ Jan  3 15:17:41 Enabled. ]
[ Jan  3 15:17:51 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  3 15:17:51 Method "start" exited with status 0. ]
[ Jan  4 14:59:00 Stopping because process dumped core. ]
[ Jan  4 14:59:00 Executing stop method ("/lib/svc/method/svc-syseventd stop 20"). ]
[ Jan  4 14:59:00 Method "stop" exited with status 0. ]
[ Jan  4 14:59:00 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  4 14:59:00 Method "start" exited with status 0. ]
[ Jan  4 14:59:02 Stopping because process dumped core. ]
[ Jan  4 14:59:02 Executing stop method ("/lib/svc/method/svc-syseventd stop 163"). ]
[ Jan  4 14:59:02 Method "stop" exited with status 0. ]
[ Jan  4 14:59:02 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  4 14:59:02 Method "start" exited with status 0. ]
[ Jan  4 14:59:03 Stopping because process dumped core. ]
[ Jan  4 14:59:03 Executing stop method ("/lib/svc/method/svc-syseventd stop 165"). ]
[ Jan  4 14:59:03 Method "stop" exited with status 0. ]
[ Jan  4 14:59:03 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  4 14:59:04 Method "start" exited with status 0. ]
[ Jan  4 14:59:05 Stopping because process dumped core. ]
[ Jan  4 14:59:05 Executing stop method ("/lib/svc/method/svc-syseventd stop 167"). ]
[ Jan  4 14:59:05 Method "stop" exited with status 0. ]
[ Jan  4 14:59:05 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  4 14:59:05 Method "start" exited with status 0. ]
[ Jan  4 14:59:07 Stopping because process dumped core. ]
[ Jan  4 14:59:07 Executing stop method ("/lib/svc/method/svc-syseventd stop 169"). ]
[ Jan  4 14:59:07 Method "stop" exited with status 0. ]
[ Jan  4 14:59:07 Restarting too quickly, changing state to maintenance. ]
[ Jan  6 23:48:52 Enabled. ]
[ Jan  6 23:49:02 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  6 23:49:02 Method "start" exited with status 0. ]
[ Jan  6 23:49:25 Rereading configuration. ]
[ Jan  6 23:49:25 No 'refresh' method defined.  Treating as :true. ]
[ Jan  7 14:44:12 Enabled. ]
[ Jan  7 14:44:20 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan  7 14:44:20 Method "start" exited with status 0. ]
[ Jan 18 21:48:44 Enabled. ]
[ Jan 18 21:48:52 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 21:48:53 Method "start" exited with status 0. ]
[ Jan 18 21:59:32 Enabled. ]
[ Jan 18 21:59:39 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 21:59:39 Method "start" exited with status 0. ]
[ Jan 18 22:31:19 Enabled. ]
[ Jan 18 22:31:27 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 22:31:27 Method "start" exited with status 0. ]
[ Jan 18 22:38:03 Disabled. ]
[ Jan 18 22:53:47 Enabled. ]
[ Jan 18 22:53:56 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 22:53:56 Method "start" exited with status 0. ]
[ Jan 22 15:14:55 Enabled. ]
[ Jan 22 15:15:03 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 22 15:15:04 Method "start" exited with status 0. ]
[ Mar 31 13:13:33 Enabled. ]
[ Mar 31 13:13:41 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:13:41 Method "start" exited with status 0. ]
[ Mar 31 13:13:43 Stopping because process dumped core. ]
[ Mar 31 13:13:43 Executing stop method ("/lib/svc/method/svc-syseventd stop 21"). ]
[ Mar 31 13:13:43 Method "stop" exited with status 0. ]
[ Mar 31 13:13:43 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:13:43 Method "start" exited with status 0. ]
[ Mar 31 13:13:45 Stopping because process dumped core. ]
[ Mar 31 13:13:45 Executing stop method ("/lib/svc/method/svc-syseventd stop 44"). ]
[ Mar 31 13:13:45 Method "stop" exited with status 0. ]
[ Mar 31 13:13:45 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:13:45 Method "start" exited with status 0. ]
[ Mar 31 13:13:46 Stopping because process dumped core. ]
[ Mar 31 13:13:46 Executing stop method ("/lib/svc/method/svc-syseventd stop 50"). ]
[ Mar 31 13:14:47 Method or service exit timed out.  Killing contract 51. ]
[ Mar 31 13:14:47 Method "stop" failed due to signal KILL. ]
[ Mar 31 13:14:47 Executing stop method ("/lib/svc/method/svc-syseventd stop 50"). ]
[ Mar 31 13:15:08 Method "stop" exited with status 0. ]
[ Mar 31 13:15:08 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:15:09 Method "start" exited with status 0. ]
[ Mar 31 13:15:10 Stopping because process dumped core. ]
.....
[ Apr 12 09:00:37 Stopping because process dumped core. ]
[ Apr 12 09:00:37 Executing stop method ("/lib/svc/method/svc-syseventd stop 1004"). ]
[ Apr 12 09:00:37 Method "stop" exited with status 0. ]
[ Apr 12 09:00:37 Restarting too quickly, changing state to maintenance. ]
[ Apr 12 09:44:14 Leaving maintenance because clear requested. ]
[ Apr 12 09:44:14 Enabled. ]
[ Apr 12 09:44:14 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:15 Method "start" exited with status 0. ]
[ Apr 12 09:44:19 Stopping because process dumped core. ]
[ Apr 12 09:44:19 Executing stop method ("/lib/svc/method/svc-syseventd stop 1042"). ]
[ Apr 12 09:44:19 Method "stop" exited with status 0. ]
[ Apr 12 09:44:19 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:19 Method "start" exited with status 0. ]
[ Apr 12 09:44:21 Stopping because process dumped core. ]
[ Apr 12 09:44:21 Executing stop method ("/lib/svc/method/svc-syseventd stop 1048"). ]
[ Apr 12 09:44:21 Method "stop" exited with status 0. ]
[ Apr 12 09:44:21 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:22 Method "start" exited with status 0. ]
[ Apr 12 09:44:24 Stopping because process dumped core. ]
[ Apr 12 09:44:24 Executing stop method ("/lib/svc/method/svc-syseventd stop 1050"). ]
[ Apr 12 09:44:24 Method "stop" exited with status 0. ]
[ Apr 12 09:44:24 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:25 Method "start" exited with status 0. ]
[ Apr 12 09:44:26 Stopping because process dumped core. ]
[ Apr 12 09:44:26 Executing stop method ("/lib/svc/method/svc-syseventd stop 1052"). ]
[ Apr 12 09:44:26 Method "stop" exited with status 0. ]
[ Apr 12 09:44:26 Restarting too quickly, changing state to maintenance. ]

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

So it looks like this whole trail starts with dbus failing to connect.

So if you can disable the services and then start them one at a time.. dbus is finding a failure and taking a long time to start and nms can't seem to start

svcadm disable nmv svcadm disable nmc svcadm disable nms svcadm disable dbus

check ps to see if anything is lingering...

then startup dbus

svcadm enable dbus

wait to make sure it comes up..

then enable the rest of them in order.

once they can come up, then run upgrade

check for failures fmadm faulty fmdump -V

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Ah, some progress!!

Done as you've said and nms etc are up but the upgrade has failed

nmc@monster1:/$ setup appliance upgrade
Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ?  Yes
You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed?  Yes
Checking repository sources. Please wait...
Found new upgrades!
Verifying upgrades...
Verification failed. Could not download all needed packages.
Show detailed execution log?  Yes
Reading package lists...

nmc@monster1:/$ 

Taking a look at fmdump ....

root@monster1:/# fmadm faulty
root@monster1:/# fmdump -V
TIME                           UUID                                 SUNW-MSG-ID
Jan 02 2012 15:47:00.224380000 c09ce577-19ef-6c7a-82b8-b0ae10a57c40 ZFS-8000-D3

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = c09ce577-19ef-6c7a-82b8-b0ae10a57c40
        code = ZFS-8000-D3
        diag-time = 1325472420 198022
        de = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = Sun-Fire-X4100-M2
                        chassis-id = 0904BD25DB
                        server-id = myhost
                (end authority)

                mod-name = zfs-diagnosis
                mod-version = 1.0
        (end de)

        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x9639a5d41783204d
                        vdev = 0xefa28e7e7bb5a090
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x9639a5d41783204d
                        vdev = 0xefa28e7e7bb5a090
                (end resource)

        (end fault-list[0])

        fault-status = 0x3
        severity = Major
        __ttl = 0x1
        __tod = 0x4f011aa4 0xd5fc460

TIME                           UUID                                 SUNW-MSG-ID
Feb 02 2012 16:15:30.240858000 c09ce577-19ef-6c7a-82b8-b0ae10a57c40 FMD-8000-4M Repaired

nvlist version: 0
        version = 0x0
        class = list.repaired
        uuid = c09ce577-19ef-6c7a-82b8-b0ae10a57c40
        code = FMD-8000-4M
        diag-time = 1325472420 198022
        de = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = Sun-Fire-X4100-M2
                        chassis-id = 0904BD25DB
                        server-id = myhost
                (end authority)

                mod-name = zfs-diagnosis
                mod-version = 1.0
        (end de)

        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x9639a5d41783204d
                        vdev = 0xefa28e7e7bb5a090
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x9639a5d41783204d
                        vdev = 0xefa28e7e7bb5a090
                (end resource)

        (end fault-list[0])

        fault-status = 0x6
        severity = Minor
        __ttl = 0x1
        __tod = 0x4f29ffd2 0xe5b3390

TIME                           UUID                                 SUNW-MSG-ID
Feb 02 2012 16:15:30.256981000 c09ce577-19ef-6c7a-82b8-b0ae10a57c40 FMD-8000-6U Resolved

nvlist version: 0
        version = 0x0
        class = list.resolved
        uuid = c09ce577-19ef-6c7a-82b8-b0ae10a57c40
        code = FMD-8000-6U
        diag-time = 1325472420 198022
        de = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = Sun-Fire-X4100-M2
                        chassis-id = 0904BD25DB
                        server-id = myhost
                (end authority)

                mod-name = zfs-diagnosis
                mod-version = 1.0
        (end de)

        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x9639a5d41783204d
                        vdev = 0xefa28e7e7bb5a090
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x9639a5d41783204d
                        vdev = 0xefa28e7e7bb5a090
                (end resource)

        (end fault-list[0])

        fault-status = 0x6
        severity = Minor
        __ttl = 0x1
        __tod = 0x4f29ffd2 0xf513808

root@monster1:/#

sysevent is still broken (is that why the upgrade is failing??)

root@monster1:/# svcs -xv
svc:/system/sysevent:default (system event notification)
 State: maintenance since Fri Apr 13 21:36:04 2012
Reason: Restarting too quickly.
   See: http://sun.com/msg/SMF-8000-L5
   See: man -M /usr/share/man -s 1M syseventd
   See: /var/svc/log/system-sysevent:default.log
Impact: This service is not running.
root@monster1:/#
root@monster1:/# svcadm clear svc:/system/sysevent:default
root@monster1:/# svcs -xv
svc:/system/fmd:default (Solaris Fault Manager)
 State: offline since Fri Apr 13 21:46:44 2012
Reason: Start method is running.
   See: http://sun.com/msg/SMF-8000-C4
   See: man -M /usr/share/man -s 1M fmd
   See: /var/svc/log/system-fmd:default.log
Impact: This service is not running.
root@monster1:/#

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

Can you show me your zpool status -V

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Here you go

root@monster1:/# zpool status -v
  pool: mpool
 state: ONLINE
 scan: resilvered 359G in 53h47m with 0 errors on Thu Feb  9 01:20:56 2012
config:

        NAME                   STATE     READ WRITE CKSUM
        mpool                  ONLINE       0     0     0
          raidz2-0             ONLINE       0     0     0
            c1t0d0p0           ONLINE       0     0     0
            c1t1d0p0           ONLINE       0     0     0
            c1t2d0p0           ONLINE       0     0     0
            c1t3d0p0           ONLINE       0     0     0
            c1t4d0p0           ONLINE       0     0     0
            c1t11d0p0          ONLINE       0     0     0
            c1t6d0p0           ONLINE       0     0     0
            c1t7d0p0           ONLINE       0     0     0
            c1t8d0p0           ONLINE       0     0     0
            c1t9d0p0           ONLINE       0     0     0
        logs
          mirror-1             ONLINE       0     0     0
            /dev/ramdisk/zil1  ONLINE       0     0     0
            /dev/ramdisk/zil2  ONLINE       0     0     0
        spares
          c1t5d0               AVAIL
          c1t10d0              AVAIL

errors: No known data errors

  pool: syspool
 state: ONLINE
 scan: resilvered 30.3G in 0h10m with 0 errors on Wed Apr  4 14:50:19 2012
config:

        NAME          STATE     READ WRITE CKSUM
        syspool       ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c2t0d0s0  ONLINE       0     0     0
            c2t1d0s0  ONLINE       0     0     0

errors: No known data errors

The mpool resilvered after swapping out a dud drive for a new one and the syspool was originally on a single drive but I eventually got around to using mirrored drives for it and that's what the resilver was, adding the mirror.

I've been testing using ramdisks for the zil and it's incredibly fast and fabulous until the machine reboots, then the faecal matter gets into the air conditioning :(

Also, after trying alsorts of other ways to get decent iscsi performance out of the thing I ended up having to disable the zfs write throttle. That was the last resort but nothing else came close to fixing the performance issues (it would average 5000ms for a write vs about 3ms with the write throttle disabled)

echo zfs_no_write_throttle/W 1 | mdb -kw

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

The dud drive is probably what started the problems.

Have you been able to get nms up and the upgrade running?

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

The NMS is running fine now but the upgrade fails but doesn't tell me why it failed

nmc@monster1:/$ setup appliance upgrade                                                                                                                                             
Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ?  Yes
You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed?  Yes
Checking repository sources. Please wait...
Found new upgrades!
Verifying upgrades...
Verification failed. Could not download all needed packages. To obtain detailed error information, please re-run this command with -v (verbose) option, or see usage (-h)
for details.
Show detailed execution log?  Yes
(Reading database ... 44335 files and directories currently installed.)

nmc@monster1:/$                                                                                                                                                                     

RE: NMS has disappeared - Added by Linda Kateley about 1 year ago

gotta run it as root

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

I am unless nmc isn't root

shane@womble.office[09:57:22]:~$ ssh root@monster1.office
Password: 
Last login: Tue Apr 17 09:48:43 2012 from womble.office
nmc@monster1:/$ setup appliance upgrade                                                                                                                                             
Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ?  Yes
You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed?  Yes
Checking repository sources. Please wait...
Found new upgrades!
Verifying upgrades...
Verification failed. Could not download all needed packages. To obtain detailed error information, please re-run this command with -v (verbose) option, or see usage (-h)
for details.
Show detailed execution log?  Yes
(Reading database ... 44335 files and directories currently installed.)

nmc@monster1:/$                                                                                       

RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago

Things degraded to the point nobody was able to do any work so after much swearing I downloaded the NexentaStor CE 3.1.2 and did a fresh install.

Things were fine and dandy until ....

shane@womble.office[20:03:31]:~$ ssh root@monster1.office
Password: 
Last login: Thu Apr 19 00:57:47 2012 from womble.office
nmc@monster1:/$ setup appliance upgrade                                                                                                                                             
Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ?  Yes
You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed?  Yes
Checking repository sources. Please wait...
Found new upgrades!
Verifying upgrades...
Trying to gain exclusive access to the appliance.
This operation may take up to 30 seconds to complete. Please wait...
Exclusive access granted.
Initiating appliance upgrade procedure. Please wait...
Success. This upgrade will download approximately 8.54MB
Downloading upgrades. This may take a few minutes. Please wait...
Upgrade is in progress. Please DO NOT interrupt...
Creating Rollback Checkpoint...

Rollback Checkpoint has been created: rootfs-nmu-002

Use NMC 'show appliance checkpoint' command to list all available
system checkpoints

Reading package lists...
Building dependency tree...
Reading state information...
dpkg is already the newest version.
apt is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
Reading package lists...
Building dependency tree...
Reading state information...
The following packages will be upgraded:
  base-files nmc nms nms-dev nmv nmv-theme-nexenta
6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B/8750kB of archives.
After this operation, 20.3MB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 43814 files and directories currently installed.)
Preparing to replace base-files 4.0.2nexenta9 (using .../base-files_4.0.2nexenta11_solaris-i386.deb) ...
Unpacking replacement base-files ...
Processing triggers for man-db ...
Setting up base-files (4.0.2nexenta11) ...

(Reading database ... 43814 files and directories currently installed.)
Preparing to replace nms 3.1.1-7231-r9546 (using .../nms_3.1.2-8147-r9697_solaris-i386.deb) ...
Stopping NMS service... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 done
Unpacking replacement nms ...
Preparing to replace nms-dev 3.1.1-7231-r9546 (using .../nms-dev_3.1.2-8147-r9697_all.deb) ...
Unpacking replacement nms-dev ...
Preparing to replace nmc 3.1.1-7231-r9549 (using .../nmc_3.1.2-8147-r9697_solaris-i386.deb) ...
Unpacking replacement nmc ...
Preparing to replace nmv 3.1.1-6829-r9491 (using .../nmv_3.1.2-8147-r9697_solaris-i386.deb) ...
Unpacking replacement nmv ...
Preparing to replace nmv-theme-nexenta 3.1.1-6829-r9491 (using .../nmv-theme-nexenta_3.1.2-8147-r9697_solaris-i386.deb) ...
Unpacking replacement nmv-theme-nexenta ...
Setting up nms (3.1.2-8147-r9697) ...
Starting NMS service... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 179 180 done

Setting up nms-dev (3.1.2-8147-r9697) ...
Setting up nmc (3.1.2-8147-r9697) ...

Setting up nmv-theme-nexenta (3.1.2-8147-r9697) ...
Setting up nmv (3.1.2-8147-r9697) ...

org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "127.0.0.1:2001" Connection refused

nmc@monster1:/$                                                                                                                                                                     

But hey, at least sysevent is fixed eh!!