Forum
Welcome, Guest
Username: Password: Remember me

TOPIC:

Failed to spawn 7 years 2 weeks ago #1247

Well, I'm back. I started getting sporadic messages from one of my servers: "iscsi-ha failed to spawn new instance after 6 attmepts. MAX_STARTS is set to 5. Check Host: grgcxen1 for possible hung process."

It looks like it happened last night around 1 AM (right when some of the VMs were doing some housecleaning?), then again just now as I was configuring some VMs and setting up a new one, promoting it to DC, etc.

I tried researching this but I didn't get very far. iSCSI-HA looks like everything is OK. I don't see any other issues anywhere. So what is going on there?

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 2 weeks ago #1248

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
This means that and iscsi-ha thread took longer than it should have to complete running its various checks of processes and states. It should take < 1 second. For this alert to be triggered it was still running after 50 seconds. Chances are that once of the checks took a very long time to respond due to the ongoing maintenance. You can check the logs for the timestamp around where the error occurred find out what could have caused the slowdown.

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 2 weeks ago #1250

Thanks, Sal. Uh, which logs should I be looking at? All of the logs in /var/log around the time this happens?

I just rebooted two Windows Server 2016 VMs at the same time and got a bunch of those alerts. I was also watching iscsi-cfg status and saw it die out on the master, then come back. I didn't see any problems on the slave. It looks like the problem is when there's more activity happening. Is there anything obvious that you've seen with issues like this? Some kind of problem or underpowered servers/slow drives? I'm not really seeing any performance issues on the VMs.

Please Log in or Create an account to join the conversation.

Last edit: by Bill.

Failed to spawn 7 years 2 weeks ago #1251

I'm also seeing "Host: grgcxen1: MONITOR has reached configured threshhold - 5 - MONITOR_KILLALL is enabled. - failed to kill all existing processes."

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 2 weeks ago #1253

More info, if it helps. I went into manual mode and swapped primary/secondary storage, then rebooted 3 Windows servers and...got the same error emails from the slave this time. Which was pretty much what I expected. Interestingly, though, there does not seem to be any impact on the VMs or the clients accessing the servers. If I wasn't paying attention I wouldn't even realize anything was wrong.

I think the system as I've built it apparently can't handle the load. The only options I see are to possibly tune it better (I'm going to need some feedback there) or abandon HA-Lizard and just go back to normal local storage on 2 servers with no HA.

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 2 weeks ago #1254

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
All the errors you are seeing are the result of pending tasks which are killed off due to a timeout being reached. You can try increasing the timeout to avoid seeing the alerts.

Same goes for the iscsi-cfg status display. It allows for only a few seconds of tolerance when displaying the various statuses and will display a message stating that the threshold has been reached if data is not retrieved in a timely manner.

As you pointed out, there appears to be an underlying performance issue with the hosts.

Investigating /var/log/user.log may shed some light on which process is slow to respond.

Please Log in or Create an account to join the conversation.