|
We are running Jenkins 1.478. The master node is running on Windows 2003 (xp). It has 3 slaves - 2 other Windos machines and 1 Mac. The mac machine was working fine - then when I attempted to upgrade the O/S (from Snow Leopard to Lion) it failed due to disk errors. I've since reconstituted the machine from scratch - so all of the hardware is the same but all of the software (and configurations) are brand new (Mountain Lion). It's currently at ~1000ms response time. It has gotten up to 3000ms response time. I have added two things to slave's launch JVM options to help in diagnosing and resolving the problem: 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the slave process via jconsole) 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the machine) The timeouts have apparently caused jobs to fail with errors about channel closing: Started by upstream project "ScapeFolio" build number 83 [EnvInject] - Loading node environment variables. [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel Archiving artifacts ERROR: Publisher hudson.tasks.Mailer aborted due to exception hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) Started by upstream project "ScapeFolio" build number 83 [EnvInject] - Loading node environment variables. [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel Archiving artifacts ERROR: Publisher hudson.tasks.Mailer aborted due to exception hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) Does anyone have any recommendations on how to diagnose and resolve these problems? Thanks, Chuck |
|
Just to rule out the obvious culprits:
- The Mac hardware is in good general health? - There is plenty of free memory? The system is not swapping? - There isn't some process running and taking a lot of cpu? Spotlight indexing, Time Machine backup, some anti-virus real-time scanner? Even though Macs are great machines, even they can get messed up and become slow. -- Sami Chuck Doucette <[hidden email]> kirjoitti 24.8.2012 kello 20.19: > We are running Jenkins 1.478. > The master node is running on Windows 2003 (xp). > It has 3 slaves - 2 other Windos machines and 1 Mac. > The mac machine was working fine - then when I attempted to upgrade the O/S (from Snow Leopard to Lion) it failed due to disk errors. > I've since reconstituted the machine from scratch - so all of the hardware is the same but all of the software (and configurations) are brand new (Mountain Lion). > > Something appears to be causing one of our slave nodes (on Mac OSX) to take longer and longer to respond. > It's currently at ~1000ms response time. > It has gotten up to 3000ms response time. > > I have added two things to slave's launch JVM options to help in diagnosing and resolving the problem: > 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the slave process via jconsole) > 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the machine) > > The timeouts have apparently caused jobs to fail with errors about channel closing: > Started by upstream project "ScapeFolio" build number 83 > > [EnvInject] - Loading node environment variables. > [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel > Archiving artifacts > ERROR: Publisher hudson.tasks.Mailer aborted due to exception > > hudson.remoting.ChannelClosedException > : channel is already closed > at > hudson.remoting.Channel.send(Channel.java:492) > Started by upstream project "ScapeFolio" build number 83 > > [EnvInject] - Loading node environment variables. > [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel > Archiving artifacts > ERROR: Publisher hudson.tasks.Mailer aborted due to exception > > hudson.remoting.ChannelClosedException > : channel is already closed > at > hudson.remoting.Channel.send(Channel.java:492) > > Does anyone have any recommendations on how to diagnose and resolve these problems? > > Thanks, > Chuck > > |
|
Yes, I believe the Mac hardware is in good general health.
The machine has 3GB of physical memory, so I believe it has plenty of free memory. I don't believe it is swapping - but I'm not sure how to tell. I have tried running Activity Monitor and JConsole. As far as I can tell, there is no other software running. There is no Time Machine backup setup nor has any anti virus software been installed. As I said below, I had to wipe the disk and reinstall everything from scratch. So, it has: Mountain Lion, Java, Xcode. That's about it. Nobody else is logged on except the jenkins user over ssh. Now builds that should take a few minutes are taking multiple hours, and I see that time synchronization is off by a few minutes. I will try to fix the latter right now. Chuck On Aug 24, 2012, at 4:54 PM, Sami Tikka <[hidden email]> wrote: > Just to rule out the obvious culprits: > > - The Mac hardware is in good general health? > > - There is plenty of free memory? The system is not swapping? > > - There isn't some process running and taking a lot of cpu? Spotlight indexing, Time Machine backup, some anti-virus real-time scanner? > > Even though Macs are great machines, even they can get messed up and become slow. > > -- Sami > > Chuck Doucette <[hidden email]> kirjoitti 24.8.2012 kello 20.19: > >> We are running Jenkins 1.478. >> The master node is running on Windows 2003 (xp). >> It has 3 slaves - 2 other Windos machines and 1 Mac. >> The mac machine was working fine - then when I attempted to upgrade the O/S (from Snow Leopard to Lion) it failed due to disk errors. >> I've since reconstituted the machine from scratch - so all of the hardware is the same but all of the software (and configurations) are brand new (Mountain Lion). >> >> Something appears to be causing one of our slave nodes (on Mac OSX) to take longer and longer to respond. >> It's currently at ~1000ms response time. >> It has gotten up to 3000ms response time. >> >> I have added two things to slave's launch JVM options to help in diagnosing and resolving the problem: >> 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the slave process via jconsole) >> 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the machine) >> >> The timeouts have apparently caused jobs to fail with errors about channel closing: >> Started by upstream project "ScapeFolio" build number 83 >> >> [EnvInject] - Loading node environment variables. >> [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel >> Archiving artifacts >> ERROR: Publisher hudson.tasks.Mailer aborted due to exception >> >> hudson.remoting.ChannelClosedException >> : channel is already closed >> at >> hudson.remoting.Channel.send(Channel.java:492) >> Started by upstream project "ScapeFolio" build number 83 >> >> [EnvInject] - Loading node environment variables. >> [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel >> Archiving artifacts >> ERROR: Publisher hudson.tasks.Mailer aborted due to exception >> >> hudson.remoting.ChannelClosedException >> : channel is already closed >> at >> hudson.remoting.Channel.send(Channel.java:492) >> >> Does anyone have any recommendations on how to diagnose and resolve these problems? >> >> Thanks, >> Chuck >> >> > |
|
Here is more information, I just saw this message on the Manage Jenkins screen (from the master node, about the mac slave with problems):
There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/>. when I clicked on the link, I saw this: Current SCM Polling Activities There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary. The following polling activities are currently in progress: Project ↓<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#> Running for <http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#> ESSDK<http://cruisecontrol.office.everyscape.com:8080/job/ESSDK/scmPollLog/> 2 days 21 hr Uscapeit-Android<http://cruisecontrol.office.everyscape.com:8080/job/Uscapeit-Android/scmPollLog/> 2 days 21 hr ScapeFolio<http://cruisecontrol.office.everyscape.com:8080/job/ScapeFolio/scmPollLog/> 2 days 21 hr This are all projects that only run on the Mac slave node. I'm not sure how to kill these SCM polling jobs. I do know how to kill regular build jobs. Perhaps I can try SCM notification instead (notify jenkins to rebuild upon checkin). Chuck On Aug 27, 2012, at 10:11 AM, Chuck Doucette <[hidden email]<mailto:[hidden email]>> wrote: Yes, I believe the Mac hardware is in good general health. The machine has 3GB of physical memory, so I believe it has plenty of free memory. I don't believe it is swapping - but I'm not sure how to tell. I have tried running Activity Monitor and JConsole. As far as I can tell, there is no other software running. There is no Time Machine backup setup nor has any anti virus software been installed. As I said below, I had to wipe the disk and reinstall everything from scratch. So, it has: Mountain Lion, Java, Xcode. That's about it. Nobody else is logged on except the jenkins user over ssh. Now builds that should take a few minutes are taking multiple hours, and I see that time synchronization is off by a few minutes. I will try to fix the latter right now. Chuck On Aug 24, 2012, at 4:54 PM, Sami Tikka <[hidden email]<mailto:[hidden email]>> wrote: Just to rule out the obvious culprits: - The Mac hardware is in good general health? - There is plenty of free memory? The system is not swapping? - There isn't some process running and taking a lot of cpu? Spotlight indexing, Time Machine backup, some anti-virus real-time scanner? Even though Macs are great machines, even they can get messed up and become slow. -- Sami Chuck Doucette <[hidden email]<mailto:[hidden email]>> kirjoitti 24.8.2012 kello 20.19: We are running Jenkins 1.478. The master node is running on Windows 2003 (xp). It has 3 slaves - 2 other Windos machines and 1 Mac. The mac machine was working fine - then when I attempted to upgrade the O/S (from Snow Leopard to Lion) it failed due to disk errors. I've since reconstituted the machine from scratch - so all of the hardware is the same but all of the software (and configurations) are brand new (Mountain Lion). Something appears to be causing one of our slave nodes (on Mac OSX) to take longer and longer to respond. It's currently at ~1000ms response time. It has gotten up to 3000ms response time. I have added two things to slave's launch JVM options to help in diagnosing and resolving the problem: 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the slave process via jconsole) 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the machine) The timeouts have apparently caused jobs to fail with errors about channel closing: Started by upstream project "ScapeFolio" build number 83 [EnvInject] - Loading node environment variables. [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel Archiving artifacts ERROR: Publisher hudson.tasks.Mailer aborted due to exception hudson.remoting.ChannelClosedException : channel is already closed at hudson.remoting.Channel.send(Channel.java:492) Started by upstream project "ScapeFolio" build number 83 [EnvInject] - Loading node environment variables. [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel Archiving artifacts ERROR: Publisher hudson.tasks.Mailer aborted due to exception hudson.remoting.ChannelClosedException : channel is already closed at hudson.remoting.Channel.send(Channel.java:492) Does anyone have any recommendations on how to diagnose and resolve these problems? Thanks, Chuck |
|
I rebooted the machine and reran every job destined to be run on that Mac slave node.
The all completed quickly and successfully. Now, as I monitor the slave node, here's what I see for it: Clock Difference: 3.5 sec ahead Response Time 3410 ms When I login to the machine (as myself, not the Jenkins user), and I start Activity Monitor, and I review All Processes, I see: a) highest CPU usage is <10% - averaging around 2% for ScreenSharingAgent and 1% for Activity Monitor b) free memory: 1.8GB c) used memory: 1.2GB d) swap used: 0 bytes So, I believe I have definitively answered your question about plenty of free memory and not swapping. When I run jconsole to review the Jenkins slave process, I see: a) 6MB of heap memory usage b) 15 live threads c) 2834 classes loaded d) 0.1% CPU usage As I continue to review memory usage of the Jenkins slave process, I see after an initial drop from ~20MB to ~4MB, I see a steady climb. It's already back up to 15MB. Chuck On Aug 27, 2012, at 10:22 AM, Chuck Doucette <[hidden email]> wrote: > Here is more information, I just saw this message on the Manage Jenkins screen (from the master node, about the mac slave with problems): > > There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/>. > > when I clicked on the link, I saw this: > > Current SCM Polling Activities > There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary. > > The following polling activities are currently in progress: > > Project ↓<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#> Running for <http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#> > ESSDK<http://cruisecontrol.office.everyscape.com:8080/job/ESSDK/scmPollLog/> 2 days 21 hr > Uscapeit-Android<http://cruisecontrol.office.everyscape.com:8080/job/Uscapeit-Android/scmPollLog/> 2 days 21 hr > ScapeFolio<http://cruisecontrol.office.everyscape.com:8080/job/ScapeFolio/scmPollLog/> 2 days 21 hr > > This are all projects that only run on the Mac slave node. > > I'm not sure how to kill these SCM polling jobs. > I do know how to kill regular build jobs. > Perhaps I can try SCM notification instead (notify jenkins to rebuild upon checkin). > > Chuck > > On Aug 27, 2012, at 10:11 AM, Chuck Doucette <[hidden email]<mailto:[hidden email]>> wrote: > > Yes, I believe the Mac hardware is in good general health. > The machine has 3GB of physical memory, so I believe it has plenty of free memory. > I don't believe it is swapping - but I'm not sure how to tell. > I have tried running Activity Monitor and JConsole. > As far as I can tell, there is no other software running. > There is no Time Machine backup setup nor has any anti virus software been installed. > > As I said below, I had to wipe the disk and reinstall everything from scratch. > So, it has: Mountain Lion, Java, Xcode. > That's about it. > Nobody else is logged on except the jenkins user over ssh. > > Now builds that should take a few minutes are taking multiple hours, and I see that time synchronization is off by a few minutes. I will try to fix the latter right now. > > Chuck > > On Aug 24, 2012, at 4:54 PM, Sami Tikka <[hidden email]<mailto:[hidden email]>> wrote: > > Just to rule out the obvious culprits: > > - The Mac hardware is in good general health? > > - There is plenty of free memory? The system is not swapping? > > - There isn't some process running and taking a lot of cpu? Spotlight indexing, Time Machine backup, some anti-virus real-time scanner? > > Even though Macs are great machines, even they can get messed up and become slow. > > -- Sami > > Chuck Doucette <[hidden email]<mailto:[hidden email]>> kirjoitti 24.8.2012 kello 20.19: > > We are running Jenkins 1.478. > The master node is running on Windows 2003 (xp). > It has 3 slaves - 2 other Windos machines and 1 Mac. > The mac machine was working fine - then when I attempted to upgrade the O/S (from Snow Leopard to Lion) it failed due to disk errors. > I've since reconstituted the machine from scratch - so all of the hardware is the same but all of the software (and configurations) are brand new (Mountain Lion). > > Something appears to be causing one of our slave nodes (on Mac OSX) to take longer and longer to respond. > It's currently at ~1000ms response time. > It has gotten up to 3000ms response time. > > I have added two things to slave's launch JVM options to help in diagnosing and resolving the problem: > 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the slave process via jconsole) > 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the machine) > > The timeouts have apparently caused jobs to fail with errors about channel closing: > Started by upstream project "ScapeFolio" build number 83 > > [EnvInject] - Loading node environment variables. > [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel > Archiving artifacts > ERROR: Publisher hudson.tasks.Mailer aborted due to exception > > hudson.remoting.ChannelClosedException > : channel is already closed > at > hudson.remoting.Channel.send(Channel.java:492) > Started by upstream project "ScapeFolio" build number 83 > > [EnvInject] - Loading node environment variables. > [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel > Archiving artifacts > ERROR: Publisher hudson.tasks.Mailer aborted due to exception > > hudson.remoting.ChannelClosedException > : channel is already closed > at > hudson.remoting.Channel.send(Channel.java:492) > > Does anyone have any recommendations on how to diagnose and resolve these problems? > > Thanks, > Chuck > > > > > |
|
The 3.5 sec clock drift is not yet alarming but it does cause me to raise an eyebrow. You are not synchronizing clocks between your Jenkins master, slave and SCM server?
Try to get the clocks in sync. Install ntp daemon where it is missing and configure them to sync to your organization's ntp server or pool.ntp.org. If that does not help, go to http://YOURJENKINS/threadDump, save it to pastebin/gits and ask someone on the list to take a look. -- Sami Chuck Doucette <[hidden email]> kirjoitti 27.8.2012 kello 18.36: > I rebooted the machine and reran every job destined to be run on that Mac slave node. > The all completed quickly and successfully. > Now, as I monitor the slave node, here's what I see for it: > > Clock Difference: > 3.5 sec ahead > > Response Time > 3410 ms > > When I login to the machine (as myself, not the Jenkins user), and I start Activity Monitor, and I review All Processes, I see: > a) highest CPU usage is <10% - averaging around 2% for ScreenSharingAgent and 1% for Activity Monitor > b) free memory: 1.8GB > c) used memory: 1.2GB > d) swap used: 0 bytes > > So, I believe I have definitively answered your question about plenty of free memory and not swapping. > > When I run jconsole to review the Jenkins slave process, I see: > a) 6MB of heap memory usage > b) 15 live threads > c) 2834 classes loaded > d) 0.1% CPU usage > > As I continue to review memory usage of the Jenkins slave process, I see after an initial drop from ~20MB to ~4MB, I see a steady climb. It's already back up to 15MB. > > Chuck > > On Aug 27, 2012, at 10:22 AM, Chuck Doucette <[hidden email]> wrote: > >> Here is more information, I just saw this message on the Manage Jenkins screen (from the master node, about the mac slave with problems): >> >> There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/>. >> >> when I clicked on the link, I saw this: >> >> Current SCM Polling Activities >> There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary. >> >> The following polling activities are currently in progress: >> >> Project ↓<http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#> Running for <http://cruisecontrol.office.everyscape.com:8080/descriptor/hudson.triggers.SCMTrigger/#> >> ESSDK<http://cruisecontrol.office.everyscape.com:8080/job/ESSDK/scmPollLog/> 2 days 21 hr >> Uscapeit-Android<http://cruisecontrol.office.everyscape.com:8080/job/Uscapeit-Android/scmPollLog/> 2 days 21 hr >> ScapeFolio<http://cruisecontrol.office.everyscape.com:8080/job/ScapeFolio/scmPollLog/> 2 days 21 hr >> >> This are all projects that only run on the Mac slave node. >> >> I'm not sure how to kill these SCM polling jobs. >> I do know how to kill regular build jobs. >> Perhaps I can try SCM notification instead (notify jenkins to rebuild upon checkin). >> >> Chuck >> >> On Aug 27, 2012, at 10:11 AM, Chuck Doucette <[hidden email]<mailto:[hidden email]>> wrote: >> >> Yes, I believe the Mac hardware is in good general health. >> The machine has 3GB of physical memory, so I believe it has plenty of free memory. >> I don't believe it is swapping - but I'm not sure how to tell. >> I have tried running Activity Monitor and JConsole. >> As far as I can tell, there is no other software running. >> There is no Time Machine backup setup nor has any anti virus software been installed. >> >> As I said below, I had to wipe the disk and reinstall everything from scratch. >> So, it has: Mountain Lion, Java, Xcode. >> That's about it. >> Nobody else is logged on except the jenkins user over ssh. >> >> Now builds that should take a few minutes are taking multiple hours, and I see that time synchronization is off by a few minutes. I will try to fix the latter right now. >> >> Chuck >> >> On Aug 24, 2012, at 4:54 PM, Sami Tikka <[hidden email]<mailto:[hidden email]>> wrote: >> >> Just to rule out the obvious culprits: >> >> - The Mac hardware is in good general health? >> >> - There is plenty of free memory? The system is not swapping? >> >> - There isn't some process running and taking a lot of cpu? Spotlight indexing, Time Machine backup, some anti-virus real-time scanner? >> >> Even though Macs are great machines, even they can get messed up and become slow. >> >> -- Sami >> >> Chuck Doucette <[hidden email]<mailto:[hidden email]>> kirjoitti 24.8.2012 kello 20.19: >> >> We are running Jenkins 1.478. >> The master node is running on Windows 2003 (xp). >> It has 3 slaves - 2 other Windos machines and 1 Mac. >> The mac machine was working fine - then when I attempted to upgrade the O/S (from Snow Leopard to Lion) it failed due to disk errors. >> I've since reconstituted the machine from scratch - so all of the hardware is the same but all of the software (and configurations) are brand new (Mountain Lion). >> >> Something appears to be causing one of our slave nodes (on Mac OSX) to take longer and longer to respond. >> It's currently at ~1000ms response time. >> It has gotten up to 3000ms response time. >> >> I have added two things to slave's launch JVM options to help in diagnosing and resolving the problem: >> 1) -Dcom.sun.management.jmxremote (so I can monitor the performance of the slave process via jconsole) >> 2) -Xmx2048m (to use 2GB of the 3GB of physical memory available on the machine) >> >> The timeouts have apparently caused jobs to fail with errors about channel closing: >> Started by upstream project "ScapeFolio" build number 83 >> >> [EnvInject] - Loading node environment variables. >> [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel >> Archiving artifacts >> ERROR: Publisher hudson.tasks.Mailer aborted due to exception >> >> hudson.remoting.ChannelClosedException >> : channel is already closed >> at >> hudson.remoting.Channel.send(Channel.java:492) >> Started by upstream project "ScapeFolio" build number 83 >> >> [EnvInject] - Loading node environment variables. >> [EnvInject] - [ERROR] - SEVERE ERROR occurs: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel >> Archiving artifacts >> ERROR: Publisher hudson.tasks.Mailer aborted due to exception >> >> hudson.remoting.ChannelClosedException >> : channel is already closed >> at >> hudson.remoting.Channel.send(Channel.java:492) >> >> Does anyone have any recommendations on how to diagnose and resolve these problems? >> >> Thanks, >> Chuck >> >> >> >> >> > |
| Powered by Nabble | Edit this page |
