Does not work NMS&NMV

Added by Roman Dr over 2 years ago

Hi, please help my problem. After reboot Nexenta, does not work NMS&NMV&NMC!

after console login as root: SYSTEM NOTICE

 Failed to initialize NMC:
 no introspection data available for method 'get_props' in object '/Root/Appliance', and object is not cast to any interface

 Suggested possible recovery actions:
    - Reboot into a known working system checkpoint
    - Run 'svcadm clear nms'; then try to re-login
 Suggested troubleshooting actions:
    - Run 'svcs -vx' and collect output for further analysis
    - Run 'dmesg' and look for error messages
    - View "/var/log/nms.log" for error messages
    - View "/var/svc/log/application-nms:default.log" for error messages

Entering UNIX shell. Type 'exit' to go back to NMC login...

log:

less /var/log/nms.log

Jan 14 09:11:51 (1:) Nexenta Management Server is ready (1:872) Jan 14 09:12:04 (2:) Nexenta Management Server is ready (2:1144) Jan 14 09:17:06 (hosts-check, 722) The reply timeout expired. Server is taking a long time to respond. Jan 14 09:19:07 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:24:07 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:24:22 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:29:22 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:29:37 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:34:37 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:34:52 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:39:52 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:40:07 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:45:07 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:45:22 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:50:22 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:50:37 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 09:55:37 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 09:55:53 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 10:00:53 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 10:01:08 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 10:06:08 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 10:06:23 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 10:11:23 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 10:11:38 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 10:16:38 (hosts-check, 722) FATAL: retry_svcs: org.freedesktop.DBus.Error.NoReply: The reply timeout expired. Server is taking a long time to respond. Jan 14 10:16:53 (hosts-check, 722) 192.168.8.248 is now reachable: svc:/system/filesystem/zfs/auto-tier:Pool1-BackUp-001 can be run Jan 14 10:24:17 (:1.8) Cannot initialize job type 'runner_job' child: RemoteCallFailure: failed to communicate to remote object '/Root/Job', host 'localhost'. Local Nexenta Management Server is down or busy. Please re-try the operation later. Hint: you may also login as admin and execute 'svcadm restart nms' from Unix shell. at NZA/Common.pm line 1948. Jan 14 11:30:22 (:1.8) Cannot initialize job type 'runner_job' child: RemoteCallFailure: failed to communicate to remote object '/Root/Job', host 'localhost'. Local Nexenta Management Server is down or busy. Please re-try the operation later. Hint: you may also login as admin and execute 'svcadm restart nms' from Unix shell. at NZA/Common.pm line 1948. Jan 14 12:36:27 (:1.8) Cannot initialize job type 'runner_job' child: RemoteCallFailure: failed to communicate to remote object '/Root/Job', host 'localhost'. Local Nexenta Management Server is down or busy. Please re-try the operation later. Hint: you may also login as admin and execute 'svcadm restart nms' from Unix shell. at NZA/Common.pm line 1948. Jan 14 13:42:32 (:1.8) Cannot initialize job type 'runner_job' child: RemoteCallFailure: failed to communicate to remote object '/Root/Job', host 'localhost'. Local Nexenta Management Server is down or busy. Please re-try the operation later. Hint: you may also login as admin and execute 'svcadm restart nms' from Unix shell. at NZA/Common.pm line 1948. END


Replies

RE: Does not work NMS&NMV - Added by Pavel Strashkin over 2 years ago

  1. Could you please attach here "/var/log/nms.log" file?
  2. What does show "svcs nms" shell command?

RE: Does not work NMS&NMV - Added by Roman Dr over 2 years ago

  1. attach
  2. svcs nms

STATE STIME FMRI online 9:11:30 svc:/application/nms:default

nms.log (208.6 KB)

RE: Does not work NMS&NMV - Added by Roman Dr over 2 years ago

Any ideas?

RE: Does not work NMS&NMV - Added by Pavel Strashkin over 2 years ago

Hi Roman,

I apologize for delay. I've looked at your logs and seems like the problem is in auto services.

  1. Do you have any auto services? auto-sync, auto-tier, ...
  2. Why did you reboot your appliance? What was the reason?
  3. Is it a fresh installed appliance or an upgared?
  4. Did you try to run "svcadm enable nms" from SHELL (after "svcadm clear nms" if it's in maintenance mode)? Did it help?

P.S. Could you please edit all of your messages and quote all copy/paste outputs into preformatted blocks? Tip: use toolbar icons to do it.

RE: Does not work NMS&NMV - Added by James Flatten over 2 years ago

I am having the same issue. I have found that if I disable all of the runners the NMS stops freezing. The following error shows up in these logs:

/var/log/nms.log:

Jan 27 06:06:02 (:1.2521) Cannot initialize job type 'runner_job' child: RemoteCallFailure: failed to communicate to  remote object '/Root/Job', host 'localhost'. Local  Management Server is down or busy. Please re-try the operation later. Hint: you may also login as admin and execute 'svcadm restart nms' from Unix shell. at NZA/Common.pm line 1725.

/var/log/nms-down.log

+===================================================
+ nms [863] Jan 26 09:19:09
+===================================================
863   /usr/bin/perl /lib/svc/method/nms -d
20386 <defunct>
20387 wget -O /var/lib/nza/.reminder.txt.863 -q -c http://www.nexenta.com/rem
863:    /usr/bin/perl /lib/svc/method/nms -d
feec4f45 read     (8, aadf35c, 1400)
fee8d81c _filbuf  (813bb60, 1, 8045a17, fee92b76) + d3
fee92be2 getc     (813bb60, fef53000, 8045a78, fee900a4, fef540e0) + 7a
fee900b6 fgetc    (813bb60, aadf35c, a9bd269, a9c6b48, 61, 1) + 1e
08112e79 PerlIOStdio_read (813df88, 81d1694, 8045ab7, 1, 81d1694, 0) + 26
08112b81 Perl_PerlIO_read (813df88, 81d1694, 8045ab7, 1, 813df88, 81d1694) + 3b
08112be5 PerlIO_getc (81d1694, 81d1694, 0, 0, 4d4034b1, 0) + 31
080d189d Perl_sv_gets (813df88, aababd4, 81d1694, 0, a572780, 84d03c8) + 69d
080bbc6b Perl_do_readline (813df88, 813df88, 8047bd8, 80b7ce3, 813df88, 813dfbc) + 656
080be93d Perl_pp_readline (813df88, 8047e78, 8047e18, 8063b11, 813df88, 0) + 120
080b6c28 Perl_runops_standard (813df88, 0, 1, 1, 813e17c, 813e17c) + 28
08063b11 perl_run (813df88, 8060324, 3, 8047e78, 0, 8121d1a) + 19a
080602d0 main     (80600a0, 3, 8047e78) + c4
080600a0 _start   (3, 8047f10, 8047f1e, 8047f32, 0, 8047f35) + 80

When I look at the NZA/Common.pm it appears as if your team obscured the code. This happens 5-10 minutes after we restart the nms service.

This is running on an HP DL180 G6 server with P410i raid controller.

This server is in a environment that does not allow it connect to the Internet. Any help would be appreciated.

RE: Does not work NMS&NMV - Added by Nick Brown over 2 years ago

Did you get any resolution to this as I'm seeing the same myself (and my server also does not have direct access to the Internet)? I see that you said that disabling all runners appears to be workaround - how did you do this from a Bash shell?

I'd really like to put this into production, but can't unless this bug is resolved.

Cheers, Nick.

RE: Does not work NMS&NMV - Added by James Flatten over 2 years ago

Unfortunately, no. I was able to turn off all runners from the web console in the brief minutes it is running before it crashes.

I disabled all runners from Data Management -> Runners -> Disable all runners.

I would expect there is a command shell equivalent somewhere.

This is not an ideal situation though since many of these runners provide health information vital to proper operation in a production environment.

I have heard nothing from Nexenta or the community on a fix. I could look into the issue, however it appears that all the Nexenta code in Perl has been obfuscated. Not really "open" in my mind. Maybe I am missing something on how to work with their code.

-Davin

RE: Does not work NMS&NMV - Added by Nick Brown over 2 years ago

Thanks for the quick reply. I managed to do the same and so far so good, although as you say it's not really a fix. I'm going to start enabling each runner one at a time over the next few days to identify the problematic ones.

I was attracted to this product because of its ZFS/deduplication functionality but now have my concerns as to how well supported it is. There doesn't really seem to be an active community with deep knowledge of the product. May have to live without the dedupe and go with OpenFiler.

Cheers, Nick.

RE: Does not work NMS&NMV - Added by Roman Dr over 2 years ago

Pavel Strashkin wrote:

Hi Roman,

I apologize for delay. I've looked at your logs and seems like the problem is in auto services.

  1. Do you have any auto services? auto-sync, auto-tier, ...
  2. Why did you reboot your appliance? What was the reason?
  3. Is it a fresh installed appliance or an upgared?
  4. Did you try to run "svcadm enable nms" from SHELL (after "svcadm clear nms" if it's in maintenance mode)? Did it help?

P.S. Could you please edit all of your messages and quote all copy/paste outputs into preformatted blocks? Tip: use toolbar icons to do it.

Hi, Pavel.

  1. Yes! But, after disable auto service - The problem remained!
  2. Because after reboot, some time works
  3. At first has made upgrade, then has completely reinstalled - the same result
  4. # svcadm clear nms svcadm: Instance "svc:/application/nms:default" is not in a maintenance or degraded state.

# svcadm enable nms

in the web brouser

Proxy Error

The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /.

Reason: Error reading from remote server

Apache/2.2.8 Ubuntu DAV/2 mod_ssl/2.2.8 OpenSSL/0.9.8k Server at 192.168.8.5 Port 2000

P.S.

Now works in production. For start NMV in the console #svcadm restart nms, works from 5 minutes to several days! In what a problem? Solve it please!