I use multiple VMpro servers with multiple Dxi v1000 servers:
I frequently get error "SmartMotion had stalled while copying /export/XXXX/XXXX for over 1 hour and may be hung. Please check your target storage or abort the running task. "
I can manually browse the cifs share on the Dxi
1) how can i troubleshoot this? 2) how can I cancle an individual stalled backup without aborting the whole backup policy? 3) how can I decrease this stalled timeout time? 1 hour seems excessive. Something like 30 minutes without talking to the backup device. Plenty of time for a dxi device to reboot and come back up.
Often this is triggered when the storage mount vmPRO is using on the DXi/NAS is not responding to an 'ls' command within the default 6 second timeout. There is an upcoming vmPRO release 3.1.2 which increases this timeout to 45 seconds to account for conditions where the target storage becomes unresponsive.
This condition can be caused by network utilization on the link/vSwitch you are using between the vmPRO and the DXi, the number of SmartMotion tasks/policies and streams you have active, and also when the V1000 is running near its maximum ingest capability for the host and datastores it is running on.
If there are multiple DXi on the same datastore/host, it may help to move them apart to separate hosts and distribute the disks on the DXi to different datastores.
Could you try reducing the number of simultaneous SmartMotion policies/tasks you are running to the same DXi and see if the condition resolves? I would also try switching to the NFS protocol for some of the jobs and see it behaves differently based on the underlying condition causing the timeout.