XAPI does not start in XenServer 6.5 SP1 (after upgrade), what could be?

O

oni__ino2015-07-29 18:13:12

Xen

oni__ino, 2015-07-29 18:13:12

Given: Dell R720
Problem: I installed XenServer 6.5 from an image, installed some utilities, restarted the server, everything worked, installed all SP1 updates, everything broke.
I read discussions on discussions.citrix.com but everywhere they write that xapi does not start because of old logs or because of lack of space, I tried nothing helped. I checked, there is a lot of free space, I cleaned and deleted the logs.
I looked at the xen logs, but I didn’t find anything super suspicious, there is a lot of information in them, what should I pay attention to?
tail -f /var/log/messages
tail -f /var/log/xensource.log
Tried restarting
xe-toolstack-restart

XCP RRDD daemon won't start.
As a result, logically, all requests for xe vm-list do not work.
What to pay attention to: Thank you.
UPD:
I published suspicious information from the logs below.
dmesg
..all equipment starts without serious errors, well, I omit all sorts of little things that a mouse / keyboard is not found.
trace the cause of the segfault failed

[ 102.267427] warning: `ntpdate' uses 32-bit capabilities (legacy support in use)
[ 104.942270] squeezed[7056]: segfault at 0 ip (null) sp 00007f0dd3a9abe8 error 14 in squeezed[400000+13e000]
[ 107.214621] xapi[ 7665]: SEGFAULT AT 0 IP (NULL) SP 00007FFFF08F816C8 ERROR 14 IN XAPI [400000 + 8ED000]
[108.321911] XAPI [8372]: SEGFAULT AT 0 IP (NULL) SP 00007FFFB28152F8 ERROR 14 IN XAPI [400000 + 8ED000]
[137.080868] kjournald starting. Commit interval 5 seconds
[ 137.080917] EXT3-fs (sdc1): warning: checktime reached, running e2fsck is recommended
[ 137.084231] EXT3-fs (sdc1): using internal journal
[ 137.084237] EXT3-fs (sdc1): mounted filesystem with ordered data mode
[566.876612] squeezed[26151]: segfault at 0 ip (null) sp 00007f6ef6a72be8 error
14 8ed000]
[570.875968] xapi[26257]: segfault at 0 ip (null) sp 00007fffb6d2add8 error 14 in xapi[400000+8ed000]

\\ sdc1 - usb drive
cut from /var/log/messages

xapi: [ info|***|0 thread_zero||watchdog] (Re)starting xapi...
xapi: [ info|***|0 thread_zero|Loading DHCP leases D:4cc31c067426|xapi_udhcpd] Caught exception Unix.Unix_error( 20, "open", "/var/xapi/dhcp-leases.db") loading /var/xapi/dhcp-leases.db: creating new empty leases database
mpathroot: This system is not running a multipath root, so no status update required
xenstored: A9 watch /vss 140012209594416
xenstored: A9 w event /vss 140012209594416
xapi: [ info|***|0 thread_zero|Registering SMAPIv1 plugins D:7562814530bf|sm] Registered SMAPIv1 plugins: lvm, iscsi, ext, file, dummy, hba, nfs, lvmoiscsi, lvmohba, iso, udev
xapi: [ info|***|0 thread_zero|Initialising SM state D:1515c75a0616|storage_impl] Loading storage state from: /var/run/nonpersistent/xapi/storage.db
xapi: [ info|***|0 thread_zero| Listening unix socket D:a45c6ceda7d9|xapi] Successfully bound socket to: UNIX /var/xapi/xapi
kernel: [ 107.214621] xapi[7665]: segfault at 0 ip (null) sp 00007fff08f816c8 error 14 in xapi[400000+8ed000]
fe : 7665 (/opt/xensource/bin/xapi -nowatchdog -writereadyfile /var/run/xapi_startup.coo...) exited with signal: SIGSEGV
xapi: [ info|***|0 thread_zero||watchdog] received signal: SIGSEGV
xapi: [ info|***|0 thread_zero||watchdog] xapi died with signal -10: restarting

excerpt from /var/log/xcp-rrdd-plugins.log

xcp-rrdd-gpumon: [ warn|***|0||xcp-rrdd-gpumon] NVML interface not loaded: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
xcp-rrdd- gpumon: [ info|***|0||xcp-rrdd-gpumon] Sleeping for 5 minutes
xcp-rrdd-iostat: [ warn|***|0||xcp-rrdd-iostat] The xcp-rrdd daemon seems installed. but not started. Try 'service xcp-rrdd start' Connection to the server is not available, sleeping for 10 seconds...
xcp-rrdd-squeezed: [ warn|***|0||xcp-rrdd-squeezed] The xcp-rrdd daemon appears installed. but not started. Try 'service xcp-rrdd start' Connection to the server is not available, sleeping for 10 seconds...
...
xcp-rrdd-iostat: [ info|***|0||xcp-rrdd-iostat] Received signal -11: deregistering plugin xcp-rrdd-iostat...
xcp-rrdd-squeezed: [ info|***|0||xcp-rrdd-squeezed] Received signal -11: deregistering plugin xcp-rrdd-squeezed...
xcp-rrdd-gpumon: [ info|***| 0||xcp-rrdd-gpumon] Received signal -11: deregistering plugin xcp-rrdd-gpumon...
xcp-rrdd-gpumon: [ warn|***|0||xcp-rrdd-gpumon] NVML interface not loaded : libnvidia-ml.so.1: cannot open shared object file: No such file or directory
xcp-rrdd-gpumon: [ info|***|0||xcp-rrdd-gpumon] Sleeping for 5 minutes
xcp-rrdd- iostat: [ info|**|0||xcp-rrdd-iostat] Obtained hdr=DATASOURCES , path=/dev/shm/metrics/xcp-rrdd-iostat
xcp-rrdd-squeezed: [ info|***|1 |xenstore|xenstore_watch] Couldn't read path /local/domain/0/memory/dynamic-max; forgetting last known value for domain 0
xcp-rrdd-squeezed: [ info|***|1|xenstore|xenstore_watch] Couldn't read path /local/domain/0/memory/dynamic-min; forgetting last known value for domain 0
xcp-rrdd-squeezed: [ info|***|0||xcp-rrdd-squeezed] Obtained hdr=DATASOURCES , path=/dev/shm/metrics/xcp-rrdd-squeezed
xcp- rrdd-squeezed: [ info|***|1|xenstore|xenstore_watch] Couldn't read path /local/domain/0/memory/target; forgetting last known value for domain 0
xcp-rrdd-xenpm: [ warn|***|0||xcp-rrdd-xenpm] Found 24 pCPUs
xcp-rrdd-xenpm: [ info|***|0||xcp- rrdd-xenpm] Obtained hdr=DATASOURCES , path=/dev/shm/metrics/xcp-rrdd-xenpm
xcp-rrdd-squeezed: [ warn|***|0||xenstore_watch] Couldn't find cached target value for domain 0, using 0
xcp-rrdd-squeezed: [ warn|***|0||xenstore_watch] Couldn't find cached dynamic-min value for domain 0, using 0
xcp-rrdd-squeezed: [ warn|***|0||xenstore_watch ] Couldn't find cached dynamic-max value for domain 0, using 0
xcp-rrdd-iostat: [ info|***|0||xcp-rrdd-iostat] No data sources exported

UPD2:
Now, after xe-toolstack-restart, it complains
about
Stopping the memory ballooning daemon: [FAILED]
checked the memory with standard diagnostic tools in Dell Lifecycle
UPD3:
The problem is still not solved, I'm busy with other tasks, I invite you for a discussion.
UPD4:
The server has been reinstalled, since they haven't managed to transfer much to it yet.
Perhaps the reason for this behavior was a banal failure, a bug, or my actions - I did not understand.
Thank you all for your help.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

P

Puma Thailand, 2015-07-29
@opium

To logs in /var/log

A

Argenon, 2015-07-29
@Argenon

Wangyu that the problem is in this place: "put some utilities")

_

_chrome_, 2015-08-11
@chrome0520

Is it trivial to look at df -h? I caught something similar when the place ran out on the xen section

O

office378, 2015-10-06
@office378

/var/log moved to a separate disk?