NMS has disappeared
Added by Shane the sysadmin about 1 year ago
So NMS has been crashing repeatedly. It would often take 2 minutes to get anywhere using the web interface and commandline wasn't any better.
First thing to try with anything is updating to the latest version so you're using all the latest bug fixes and go from there. So I tried upgrading the entire appliance but that failed. Then I tried updating just NMS.
It doesn't exist any longer.
# svcs -a | grep nm disabled 15:20:23 svc:/network/device-discovery/printers:snmp disabled 15:20:23 svc:/network/snmpd:default online 15:21:16 svc:/application/nmdtrace:default offline 15:20:22 svc:/application/nmv:default offline 15:20:22 svc:/application/nmcd:default
# svcs svc:/application/nms:default svcs: Pattern 'svc:/application/nms:default' doesn't match any instances STATE STIME FMRI
I'd love to tell you which version of NexentaStor Community Edition the machine is running but that requires the NMS to be working. Closest I can figure is 3.0.something ....
Hardware:
Sun X4100 Twin quad core AMD's 32GB RAM OS is on mirrored 2x 132GB 2.5" drives There's a SAS HBA card in it that connects the thing to a Sun J4200 JBOD with 12x 500GB drives
Any ideas other than nuking it and starting again (that would be bad)??
Replies
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Nobody has any ideas at all?? :(
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
sorry this moved past me :)
so we should be able to see what version you are running by in nmc
show appliance version
i would definately run upgrade though. i have seen this problem in the past but haven't for awhile..
setup appliance upgrade
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
No problem Linda. The machine is running fine and hasn't affected anyone or any other machines so as long as that continues fixing the NMS isn't time critical.
Finding the version and running upgrade could be a problem though. Not sure what sysevent has to do with anything but the other two require NMS running.
shane@womble.office[10:57:53]:~$ ssh root@monster1.office
Password:
Last login: Thu Apr 5 10:49:29 2012 from womble.office
* * *
SYSTEM NOTICE
Failed to initialize NMC:
The name com.nexenta.nms was not provided by any .service files
Suggested possible recovery actions:
- Reboot into a known working system checkpoint
- Run 'svcadm clear nms'; then try to re-login
Suggested troubleshooting actions:
- Run 'svcs -vx' and collect output for further analysis
- Run 'dmesg' and look for error messages
- View "/var/log/nms.log" for error messages
- View "/var/svc/log/application-nms:default.log" for error messages
Entering UNIX shell. Type 'exit' to go back to NMC login...
root@monster1:~# svcs -xv
svc:/system/sysevent:default (system event notification)
State: maintenance since Mon Apr 02 09:28:04 2012
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: man -M /usr/share/man -s 1M syseventd
See: /var/svc/log/system-sysevent:default.log
Impact: 1 dependent service is not running:
svc:/system/fmd:default
svc:/application/nmcd:default (Nexenta Management Console Daemon)
State: offline since Mon Apr 02 09:27:50 2012
Reason: Dependency svc:/application/nms:default is absent.
See: http://sun.com/msg/SMF-8000-E2
See: man -M /usr/share/man -s 1 nmcd
Impact: This service is not running.
svc:/application/nmv:default (Nexenta Management Views - all in one management GUI)
State: offline since Mon Apr 02 09:27:50 2012
Reason: Dependency svc:/application/nms:default is absent.
See: http://sun.com/msg/SMF-8000-E2
See: man -M /usr/share/man -s 1 nmv
Impact: This service is not running.
root@monster1:~#
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
So none of this will effect services(nfs, iscsi, cifs) from being served out, but syseventd is a core service and nms is dependent on it.
I have seen this problem a number of times and know the the patch for it is in the repo. If you can get downtime in the future to get the upgrade.
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Yeah that's not a problem, I just need to notify people that it and some virtual machines are going down on a weekend.
Question is, how am I going to upgrade it without the NMS and it's dependents running?? Will it be an apt-get upgrade or something??
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Weekend soon (we get the weekend before everyone else in the world) so I can have a go at fixing this. What do you suggest I do to sort it??
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
I would reboot and when nms comes up you should be able to run an upgrade.
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
The service doesn't exist anymore so rebooting won't bring it up (I know this because I tried it already)
root@monster1:~# svcs -a | grep nm disabled Apr_02 svc:/network/device-discovery/printers:snmp disabled Apr_02 svc:/network/snmpd:default online Apr_02 svc:/application/nmdtrace:default offline Apr_02 svc:/application/nmv:default offline Apr_02 svc:/application/nmcd:default
Before I can do an upgrade the nms service has to be reinstalled.
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
can you try a svcadm enable nms?
do you have a support contract? i would really like to have support take a look at that.. Can you get to the nms.log? This also looks like production and they can help walk you through the procedures.
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
so let me have you try a few things
what does this say?
ps -ef | grep nm
nms's log file should be
/var/svc/log/application-nms:default.log
can i see that?
what does this say
svcs -x
can i also see
svcs -l nbs
and the contents of
/var/svc/log/system-sysevent:default.log
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
one more what does this show?
svcadm clear nms
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Ok. I've had a look and all of the files for the nms appear to be present and correct, it's just the service has been uninstalled. So I reinstalled the service
root@monster1:~# svccfg import /var/svc/manifest/application/nms.xml
Kicked it out of maintenance state but it went straight back into it.
root@monster1:~# svcadm clear svc:/application/nms:default
root@monster1:~# svcs -xv
svc:/application/nms:default (Nexenta Management Services and API daemon)
State: maintenance since Thu Apr 12 09:45:14 2012
Reason: Start method failed repeatedly, last exited with status 146.
See: http://sun.com/msg/SMF-8000-KS
See: man -M /usr/share/man -s 1 nms
See: /var/svc/log/application-nms:default.log
Impact: 2 dependent services are not running:
svc:/application/nmcd:default
svc:/application/nmv:default
svc:/system/sysevent:default (system event notification)
State: maintenance since Thu Apr 12 09:44:26 2012
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: man -M /usr/share/man -s 1M syseventd
See: /var/svc/log/system-sysevent:default.log
Impact: This service is not running.
Output from ps -ef
root@monster1:~# ps -ef | grep nm
root 593 1 0 Apr 02 ? 3:54 /usr/bin/perl /lib/svc/method/nmdtrace -d
root 471 1 0 Apr 02 ? 0:00 dbus-daemon --fork --config-file=/var/lib/nza/nmdtrace.conf --print-address
root 3529 3521 0 Apr 05 pts/3 0:02 /usr/bin/perl /usr/bin/nmc
root 9052 9044 0 08:43:43 pts/4 0:02 /usr/bin/perl /usr/bin/nmc
root 12962 3536 0 13:58:36 pts/3 0:00 grep nm
root@monster1:~# svcs -l nbs fmri svc:/application/nbs:default name Nexenta Boot Services enabled true state online next_state none state_time Mon Apr 02 09:28:35 2012 logfile /var/svc/log/application-nbs:default.log restarter svc:/system/svc/restarter:default contract_id 73 dependency require_all/restart svc:/system/dbus:default (online)
The contents of /var/svc/log/application-nms:default.log
[ Mar 31 15:09:24 Method "start" exited with status 0. ]
[ Mar 31 15:09:24 Stopping because all processes in service exited. ]
[ Mar 31 15:09:24 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Mar 31 15:09:25 Method "stop" exited with status 0. ]
[ Mar 31 15:09:25 Restarting too quickly, changing state to maintenance. ]
[ Mar 31 15:10:04 Leaving maintenance because clear requested. ]
[ Mar 31 15:10:04 Enabled. ]
[ Mar 31 15:10:04 Disabled. ]
[ Mar 31 15:13:47 Enabled. ]
[ Mar 31 15:13:47 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Mar 31 15:20:22 Enabled. ]
[ Mar 31 15:21:10 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Mar 31 15:33:49 Method "start" failed due to signal KILL. ]
[ Mar 31 15:33:49 Leaving maintenance because disable requested. ]
[ Mar 31 15:33:49 Disabled. ]
[ Apr 12 08:52:15 Enabled. ]
[ Apr 12 08:52:15 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Apr 12 09:07:33 Method "start" exited with status 0. ]
[ Apr 12 09:07:33 Stopping because all processes in service exited. ]
[ Apr 12 09:07:33 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:07:35 Method "stop" exited with status 0. ]
[ Apr 12 09:07:35 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Apr 12 09:17:55 Method "start" exited with status 0. ]
[ Apr 12 09:17:55 Stopping because service restarting. ]
[ Apr 12 09:17:55 Executing stop method ("/lib/svc/method/nms stop"). ]
Stopping NMS daemon (10083) ...
[ Apr 12 09:18:01 Method "stop" exited with status 0. ]
[ Apr 12 09:18:01 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Apr 12 09:25:20 Method "start" exited with status 0. ]
[ Apr 12 09:33:58 Stopping because all processes in service exited. ]
[ Apr 12 09:33:58 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:33:59 Method "stop" exited with status 0. ]
[ Apr 12 09:33:59 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Apr 12 09:35:40 Method "start" exited with status 0. ]
[ Apr 12 09:35:40 Stopping because all processes in service exited. ]
[ Apr 12 09:35:40 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:35:42 Method "stop" exited with status 0. ]
[ Apr 12 09:35:42 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Apr 12 09:36:43 Method "start" exited with status 0. ]
[ Apr 12 09:36:43 Stopping because all processes in service exited. ]
[ Apr 12 09:36:43 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:36:44 Method "stop" exited with status 0. ]
[ Apr 12 09:36:44 Executing start method ("/lib/svc/method/nms -d"). ]
Looking for devices...
1. Logical Node: /dev/rdsk/c0t0d0p0
Physical Node: /pci@0,0/pci108e,cb84@2,1/storage@2/disk@0,0
Connected Device: HL-DT-ST DVDRAM GE20LU10 FE06
Device Type: DVD Reader/Writer
Bus: USB
Size:
Label:
Access permissions:
[ Apr 12 09:37:13 Method "start" exited with status 0. ]
[ Apr 12 09:37:13 Stopping because all processes in service exited. ]
[ Apr 12 09:37:13 Executing stop method ("/lib/svc/method/nms stop"). ]
[ Apr 12 09:37:15 Method "stop" exited with status 0. ]
[ Apr 12 09:37:15 Restarting too quickly, changing state to maintenance. ]
[ Apr 12 09:37:15 Leaving maintenance because disable requested. ]
[ Apr 12 09:37:15 Disabled. ]
[ Apr 12 09:44:14 Enabled. ]
[ Apr 12 09:44:14 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:44:15 Method "start" exited with status 146. ]
[ Apr 12 09:44:15 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:44:17 Method "start" exited with status 146. ]
[ Apr 12 09:44:17 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:44:19 Method "start" exited with status 146. ]
[ Apr 12 09:45:12 Leaving maintenance because clear requested. ]
[ Apr 12 09:45:12 Enabled. ]
[ Apr 12 09:45:12 Executing start method ("/lib/svc/method/nms -d"). ]
Uncaught exception from user code:
org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "0:2001" Connection refused
at /usr/lib/perl5/Net/DBus/Binding/Bus.pm line 85
Net::DBus::Binding::Bus::new('Net::DBus::Binding::Bus', 'address', 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at /usr/lib/perl5/Net/DBus.pm line 240
Net::DBus::new('Net::DBus', 0, 'tcp:host=0,port=2001,guid=9b97f32d52d020bd31c4ec99000000e1;un...') called at NZA/Server.pm line 662
NZA::Server::new('NZA::Server', 'com.nexenta.nms', '/com/nexenta/nms', 'HASH(0x8629210)') called at /lib/svc/method/nms line 224
[ Apr 12 09:45:14 Method "start" exited with status 146. ]
The contents of /var/svc/log/system-sysevent:default.log (chopped out part cos it's huge but I can put that bit back in. It's just repeating over and over again though)
[ Jan 3 15:17:41 Enabled. ]
[ Jan 3 15:17:51 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 3 15:17:51 Method "start" exited with status 0. ]
[ Jan 4 14:59:00 Stopping because process dumped core. ]
[ Jan 4 14:59:00 Executing stop method ("/lib/svc/method/svc-syseventd stop 20"). ]
[ Jan 4 14:59:00 Method "stop" exited with status 0. ]
[ Jan 4 14:59:00 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 4 14:59:00 Method "start" exited with status 0. ]
[ Jan 4 14:59:02 Stopping because process dumped core. ]
[ Jan 4 14:59:02 Executing stop method ("/lib/svc/method/svc-syseventd stop 163"). ]
[ Jan 4 14:59:02 Method "stop" exited with status 0. ]
[ Jan 4 14:59:02 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 4 14:59:02 Method "start" exited with status 0. ]
[ Jan 4 14:59:03 Stopping because process dumped core. ]
[ Jan 4 14:59:03 Executing stop method ("/lib/svc/method/svc-syseventd stop 165"). ]
[ Jan 4 14:59:03 Method "stop" exited with status 0. ]
[ Jan 4 14:59:03 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 4 14:59:04 Method "start" exited with status 0. ]
[ Jan 4 14:59:05 Stopping because process dumped core. ]
[ Jan 4 14:59:05 Executing stop method ("/lib/svc/method/svc-syseventd stop 167"). ]
[ Jan 4 14:59:05 Method "stop" exited with status 0. ]
[ Jan 4 14:59:05 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 4 14:59:05 Method "start" exited with status 0. ]
[ Jan 4 14:59:07 Stopping because process dumped core. ]
[ Jan 4 14:59:07 Executing stop method ("/lib/svc/method/svc-syseventd stop 169"). ]
[ Jan 4 14:59:07 Method "stop" exited with status 0. ]
[ Jan 4 14:59:07 Restarting too quickly, changing state to maintenance. ]
[ Jan 6 23:48:52 Enabled. ]
[ Jan 6 23:49:02 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 6 23:49:02 Method "start" exited with status 0. ]
[ Jan 6 23:49:25 Rereading configuration. ]
[ Jan 6 23:49:25 No 'refresh' method defined. Treating as :true. ]
[ Jan 7 14:44:12 Enabled. ]
[ Jan 7 14:44:20 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 7 14:44:20 Method "start" exited with status 0. ]
[ Jan 18 21:48:44 Enabled. ]
[ Jan 18 21:48:52 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 21:48:53 Method "start" exited with status 0. ]
[ Jan 18 21:59:32 Enabled. ]
[ Jan 18 21:59:39 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 21:59:39 Method "start" exited with status 0. ]
[ Jan 18 22:31:19 Enabled. ]
[ Jan 18 22:31:27 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 22:31:27 Method "start" exited with status 0. ]
[ Jan 18 22:38:03 Disabled. ]
[ Jan 18 22:53:47 Enabled. ]
[ Jan 18 22:53:56 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 18 22:53:56 Method "start" exited with status 0. ]
[ Jan 22 15:14:55 Enabled. ]
[ Jan 22 15:15:03 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Jan 22 15:15:04 Method "start" exited with status 0. ]
[ Mar 31 13:13:33 Enabled. ]
[ Mar 31 13:13:41 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:13:41 Method "start" exited with status 0. ]
[ Mar 31 13:13:43 Stopping because process dumped core. ]
[ Mar 31 13:13:43 Executing stop method ("/lib/svc/method/svc-syseventd stop 21"). ]
[ Mar 31 13:13:43 Method "stop" exited with status 0. ]
[ Mar 31 13:13:43 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:13:43 Method "start" exited with status 0. ]
[ Mar 31 13:13:45 Stopping because process dumped core. ]
[ Mar 31 13:13:45 Executing stop method ("/lib/svc/method/svc-syseventd stop 44"). ]
[ Mar 31 13:13:45 Method "stop" exited with status 0. ]
[ Mar 31 13:13:45 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:13:45 Method "start" exited with status 0. ]
[ Mar 31 13:13:46 Stopping because process dumped core. ]
[ Mar 31 13:13:46 Executing stop method ("/lib/svc/method/svc-syseventd stop 50"). ]
[ Mar 31 13:14:47 Method or service exit timed out. Killing contract 51. ]
[ Mar 31 13:14:47 Method "stop" failed due to signal KILL. ]
[ Mar 31 13:14:47 Executing stop method ("/lib/svc/method/svc-syseventd stop 50"). ]
[ Mar 31 13:15:08 Method "stop" exited with status 0. ]
[ Mar 31 13:15:08 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Mar 31 13:15:09 Method "start" exited with status 0. ]
[ Mar 31 13:15:10 Stopping because process dumped core. ]
.....
[ Apr 12 09:00:37 Stopping because process dumped core. ]
[ Apr 12 09:00:37 Executing stop method ("/lib/svc/method/svc-syseventd stop 1004"). ]
[ Apr 12 09:00:37 Method "stop" exited with status 0. ]
[ Apr 12 09:00:37 Restarting too quickly, changing state to maintenance. ]
[ Apr 12 09:44:14 Leaving maintenance because clear requested. ]
[ Apr 12 09:44:14 Enabled. ]
[ Apr 12 09:44:14 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:15 Method "start" exited with status 0. ]
[ Apr 12 09:44:19 Stopping because process dumped core. ]
[ Apr 12 09:44:19 Executing stop method ("/lib/svc/method/svc-syseventd stop 1042"). ]
[ Apr 12 09:44:19 Method "stop" exited with status 0. ]
[ Apr 12 09:44:19 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:19 Method "start" exited with status 0. ]
[ Apr 12 09:44:21 Stopping because process dumped core. ]
[ Apr 12 09:44:21 Executing stop method ("/lib/svc/method/svc-syseventd stop 1048"). ]
[ Apr 12 09:44:21 Method "stop" exited with status 0. ]
[ Apr 12 09:44:21 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:22 Method "start" exited with status 0. ]
[ Apr 12 09:44:24 Stopping because process dumped core. ]
[ Apr 12 09:44:24 Executing stop method ("/lib/svc/method/svc-syseventd stop 1050"). ]
[ Apr 12 09:44:24 Method "stop" exited with status 0. ]
[ Apr 12 09:44:24 Executing start method ("/lib/svc/method/svc-syseventd start"). ]
[ Apr 12 09:44:25 Method "start" exited with status 0. ]
[ Apr 12 09:44:26 Stopping because process dumped core. ]
[ Apr 12 09:44:26 Executing stop method ("/lib/svc/method/svc-syseventd stop 1052"). ]
[ Apr 12 09:44:26 Method "stop" exited with status 0. ]
[ Apr 12 09:44:26 Restarting too quickly, changing state to maintenance. ]
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
So it looks like this whole trail starts with dbus failing to connect.
So if you can disable the services and then start them one at a time.. dbus is finding a failure and taking a long time to start and nms can't seem to start
svcadm disable nmv svcadm disable nmc svcadm disable nms svcadm disable dbus
check ps to see if anything is lingering...
then startup dbus
svcadm enable dbus
wait to make sure it comes up..
then enable the rest of them in order.
once they can come up, then run upgrade
check for failures fmadm faulty fmdump -V
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Ah, some progress!!
Done as you've said and nms etc are up but the upgrade has failed
nmc@monster1:/$ setup appliance upgrade Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ? Yes You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed? Yes Checking repository sources. Please wait... Found new upgrades! Verifying upgrades... Verification failed. Could not download all needed packages. Show detailed execution log? Yes Reading package lists... nmc@monster1:/$
Taking a look at fmdump ....
root@monster1:/# fmadm faulty
root@monster1:/# fmdump -V
TIME UUID SUNW-MSG-ID
Jan 02 2012 15:47:00.224380000 c09ce577-19ef-6c7a-82b8-b0ae10a57c40 ZFS-8000-D3
nvlist version: 0
version = 0x0
class = list.suspect
uuid = c09ce577-19ef-6c7a-82b8-b0ae10a57c40
code = ZFS-8000-D3
diag-time = 1325472420 198022
de = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = Sun-Fire-X4100-M2
chassis-id = 0904BD25DB
server-id = myhost
(end authority)
mod-name = zfs-diagnosis
mod-version = 1.0
(end de)
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x9639a5d41783204d
vdev = 0xefa28e7e7bb5a090
(end asru)
resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x9639a5d41783204d
vdev = 0xefa28e7e7bb5a090
(end resource)
(end fault-list[0])
fault-status = 0x3
severity = Major
__ttl = 0x1
__tod = 0x4f011aa4 0xd5fc460
TIME UUID SUNW-MSG-ID
Feb 02 2012 16:15:30.240858000 c09ce577-19ef-6c7a-82b8-b0ae10a57c40 FMD-8000-4M Repaired
nvlist version: 0
version = 0x0
class = list.repaired
uuid = c09ce577-19ef-6c7a-82b8-b0ae10a57c40
code = FMD-8000-4M
diag-time = 1325472420 198022
de = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = Sun-Fire-X4100-M2
chassis-id = 0904BD25DB
server-id = myhost
(end authority)
mod-name = zfs-diagnosis
mod-version = 1.0
(end de)
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x9639a5d41783204d
vdev = 0xefa28e7e7bb5a090
(end asru)
resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x9639a5d41783204d
vdev = 0xefa28e7e7bb5a090
(end resource)
(end fault-list[0])
fault-status = 0x6
severity = Minor
__ttl = 0x1
__tod = 0x4f29ffd2 0xe5b3390
TIME UUID SUNW-MSG-ID
Feb 02 2012 16:15:30.256981000 c09ce577-19ef-6c7a-82b8-b0ae10a57c40 FMD-8000-6U Resolved
nvlist version: 0
version = 0x0
class = list.resolved
uuid = c09ce577-19ef-6c7a-82b8-b0ae10a57c40
code = FMD-8000-6U
diag-time = 1325472420 198022
de = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = Sun-Fire-X4100-M2
chassis-id = 0904BD25DB
server-id = myhost
(end authority)
mod-name = zfs-diagnosis
mod-version = 1.0
(end de)
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x9639a5d41783204d
vdev = 0xefa28e7e7bb5a090
(end asru)
resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x9639a5d41783204d
vdev = 0xefa28e7e7bb5a090
(end resource)
(end fault-list[0])
fault-status = 0x6
severity = Minor
__ttl = 0x1
__tod = 0x4f29ffd2 0xf513808
root@monster1:/#
sysevent is still broken (is that why the upgrade is failing??)
root@monster1:/# svcs -xv svc:/system/sysevent:default (system event notification) State: maintenance since Fri Apr 13 21:36:04 2012 Reason: Restarting too quickly. See: http://sun.com/msg/SMF-8000-L5 See: man -M /usr/share/man -s 1M syseventd See: /var/svc/log/system-sysevent:default.log Impact: This service is not running. root@monster1:/# root@monster1:/# svcadm clear svc:/system/sysevent:default root@monster1:/# svcs -xv svc:/system/fmd:default (Solaris Fault Manager) State: offline since Fri Apr 13 21:46:44 2012 Reason: Start method is running. See: http://sun.com/msg/SMF-8000-C4 See: man -M /usr/share/man -s 1M fmd See: /var/svc/log/system-fmd:default.log Impact: This service is not running. root@monster1:/#
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
Can you show me your zpool status -V
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Here you go
root@monster1:/# zpool status -v
pool: mpool
state: ONLINE
scan: resilvered 359G in 53h47m with 0 errors on Thu Feb 9 01:20:56 2012
config:
NAME STATE READ WRITE CKSUM
mpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c1t0d0p0 ONLINE 0 0 0
c1t1d0p0 ONLINE 0 0 0
c1t2d0p0 ONLINE 0 0 0
c1t3d0p0 ONLINE 0 0 0
c1t4d0p0 ONLINE 0 0 0
c1t11d0p0 ONLINE 0 0 0
c1t6d0p0 ONLINE 0 0 0
c1t7d0p0 ONLINE 0 0 0
c1t8d0p0 ONLINE 0 0 0
c1t9d0p0 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
/dev/ramdisk/zil1 ONLINE 0 0 0
/dev/ramdisk/zil2 ONLINE 0 0 0
spares
c1t5d0 AVAIL
c1t10d0 AVAIL
errors: No known data errors
pool: syspool
state: ONLINE
scan: resilvered 30.3G in 0h10m with 0 errors on Wed Apr 4 14:50:19 2012
config:
NAME STATE READ WRITE CKSUM
syspool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t0d0s0 ONLINE 0 0 0
c2t1d0s0 ONLINE 0 0 0
errors: No known data errors
The mpool resilvered after swapping out a dud drive for a new one and the syspool was originally on a single drive but I eventually got around to using mirrored drives for it and that's what the resilver was, adding the mirror.
I've been testing using ramdisks for the zil and it's incredibly fast and fabulous until the machine reboots, then the faecal matter gets into the air conditioning :(
Also, after trying alsorts of other ways to get decent iscsi performance out of the thing I ended up having to disable the zfs write throttle. That was the last resort but nothing else came close to fixing the performance issues (it would average 5000ms for a write vs about 3ms with the write throttle disabled)
echo zfs_no_write_throttle/W 1 | mdb -kw
RE: NMS has disappeared - Added by Linda Kateley about 1 year ago
The dud drive is probably what started the problems.
Have you been able to get nms up and the upgrade running?
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
The NMS is running fine now but the upgrade fails but doesn't tell me why it failed
nmc@monster1:/$ setup appliance upgrade Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ? Yes You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed? Yes Checking repository sources. Please wait... Found new upgrades! Verifying upgrades... Verification failed. Could not download all needed packages. To obtain detailed error information, please re-run this command with -v (verbose) option, or see usage (-h) for details. Show detailed execution log? Yes (Reading database ... 44335 files and directories currently installed.) nmc@monster1:/$
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
I am unless nmc isn't root
shane@womble.office[09:57:22]:~$ ssh root@monster1.office Password: Last login: Tue Apr 17 09:48:43 2012 from womble.office nmc@monster1:/$ setup appliance upgrade Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ? Yes You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed? Yes Checking repository sources. Please wait... Found new upgrades! Verifying upgrades... Verification failed. Could not download all needed packages. To obtain detailed error information, please re-run this command with -v (verbose) option, or see usage (-h) for details. Show detailed execution log? Yes (Reading database ... 44335 files and directories currently installed.) nmc@monster1:/$
RE: NMS has disappeared - Added by Shane the sysadmin about 1 year ago
Things degraded to the point nobody was able to do any work so after much swearing I downloaded the NexentaStor CE 3.1.2 and did a fresh install.
Things were fine and dandy until ....
shane@womble.office[20:03:31]:~$ ssh root@monster1.office Password: Last login: Thu Apr 19 00:57:47 2012 from womble.office nmc@monster1:/$ setup appliance upgrade Cleanup upgrade caches (note: cleanup is generally not required and can be skipped in most cases; if you say Yes, prepare to wait for software upgrade to complete a bit longer) ? Yes You are about to upgrade the appliance software. Please be advised that by executing this operation you agree to be bound by the terms of the product license available at http://www.nexenta.com/nexentastor-licenses. This operation may take some time to check with the remote appliance's software repository. Proceed? Yes Checking repository sources. Please wait... Found new upgrades! Verifying upgrades... Trying to gain exclusive access to the appliance. This operation may take up to 30 seconds to complete. Please wait... Exclusive access granted. Initiating appliance upgrade procedure. Please wait... Success. This upgrade will download approximately 8.54MB Downloading upgrades. This may take a few minutes. Please wait... Upgrade is in progress. Please DO NOT interrupt... Creating Rollback Checkpoint... Rollback Checkpoint has been created: rootfs-nmu-002 Use NMC 'show appliance checkpoint' command to list all available system checkpoints Reading package lists... Building dependency tree... Reading state information... dpkg is already the newest version. apt is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded. Reading package lists... Building dependency tree... Reading state information... The following packages will be upgraded: base-files nmc nms nms-dev nmv nmv-theme-nexenta 6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 0B/8750kB of archives. After this operation, 20.3MB disk space will be freed. Do you want to continue [Y/n]? y (Reading database ... 43814 files and directories currently installed.) Preparing to replace base-files 4.0.2nexenta9 (using .../base-files_4.0.2nexenta11_solaris-i386.deb) ... Unpacking replacement base-files ... Processing triggers for man-db ... Setting up base-files (4.0.2nexenta11) ... (Reading database ... 43814 files and directories currently installed.) Preparing to replace nms 3.1.1-7231-r9546 (using .../nms_3.1.2-8147-r9697_solaris-i386.deb) ... Stopping NMS service... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 done Unpacking replacement nms ... Preparing to replace nms-dev 3.1.1-7231-r9546 (using .../nms-dev_3.1.2-8147-r9697_all.deb) ... Unpacking replacement nms-dev ... Preparing to replace nmc 3.1.1-7231-r9549 (using .../nmc_3.1.2-8147-r9697_solaris-i386.deb) ... Unpacking replacement nmc ... Preparing to replace nmv 3.1.1-6829-r9491 (using .../nmv_3.1.2-8147-r9697_solaris-i386.deb) ... Unpacking replacement nmv ... Preparing to replace nmv-theme-nexenta 3.1.1-6829-r9491 (using .../nmv-theme-nexenta_3.1.2-8147-r9697_solaris-i386.deb) ... Unpacking replacement nmv-theme-nexenta ... Setting up nms (3.1.2-8147-r9697) ... Starting NMS service... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 179 180 done Setting up nms-dev (3.1.2-8147-r9697) ... Setting up nmc (3.1.2-8147-r9697) ... Setting up nmv-theme-nexenta (3.1.2-8147-r9697) ... Setting up nmv (3.1.2-8147-r9697) ... org.freedesktop.DBus.Error.NoServer: Failed to connect to socket "127.0.0.1:2001" Connection refused nmc@monster1:/$
But hey, at least sysevent is fixed eh!!