Quantum Forum V

Quantum Forum for DXi V5000

I've been getting this error on a regular basis and cannot seem to find any reference to it in these forums, nor in Quantum's documents. Anyone know what causes this and how to resolve it?

The complete error is "SmartMotion failed: Error: Exception('Request timed out, caught alarm override',) (Q-1001).

My environment: 7 ESX 5.0 hosts, 6 vmPro nodes, DXi 4600

Thanks,
James

Views: 29

Reply to This

Replies to This Discussion

Is the smartmotion backup failing on all the VMs, a few, or just one?

If it fails on just a few or a single VM, check the logs of the VM(s) and see if anything is happening that would cause the VM(s) and vmPro to lose communication (anti-virus software for example) during the time frame that vmPro is trying to create and move a snapshot off the VM(s).

Also, as a test, see if you can create a manual snapshot using vmWare. If that process times out as well, the issue will need to be addressed from the vmWare side.

Typically a single VM will fail, but not the same VM nor the same ESX host. It seems to be random. A VM that has been backing-up fine for weeks will suddenly fail with this error.

Manual snapshots complete successfully. 

Are the VMs that fail all managed by the same vmPro node? If so, is that node responsible for pushing more data than the other nodes?

Do the VMs fail on just full export jobs or are the failures also seen on the partial CBT backups?

I am wondering if load balancing is an issue. You may be seeing this on a single node that is becoming to busy when the full export jobs are scheduled to run.

Hi James,

Is the ESX host healthy? I just ran into a case where the customer was getting the same error and we found that the ESX host that the vmPro was installed on was in a not responding state and the vmPro was in a disconnected state, however access to the GUI was still allowed.

Even with the ESX server having issues, a reboot of vmPro via the CLI, using command [system reboot] resolved the Exception('Request timed out, caught alarm override',) (Q-1001) errors and allowed backups to proceed.

I just recently patched my ESX hosts to the current version and rebooted. None of my ESX hosts are taxed for resources.

Yes, managed by a single master node which only exports 250GB. The rest of the 6 nodes are exporting no more than 900GB, so I think I have my load balancing correct. Should I have more than 1 master and split the folders across them?

The load balancing seems to be fine. Typically you would only run one master and join the additional vmPro nodes to the master's group, no need for more than one master.

The next time a VM fails with the error. Via the GUI go to Help-->vmPro System-->Log Files. Then look for the VM name and time frame of the failure in the controller, vm_proxy_fs and smartmotion logs. Copy the logs to this discussion and maybe we can get a better idea as to why you are getting this error.

The Smartmotion logging around this exception could be helpful to understanding the error.

RSS

Tips + Tricks

© 2024   Created by Quantum Forum V.   Powered by

Badges  |  Report an Issue  |  Terms of Service