Polling hung

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Polling hung

Paul Weiss-2
I am seeing the following message on my hudson:

There are more SCM polling activities scheduled than handled, so the  
threads are not keeping up with the demands. Check if your polling is  
hanging, and/or increase the number of threads if necessary.

Unfortunately it doesn't tell how to increase the number of threads.

However there appear to be a number of polling sessions that are hanging.  
When I do threadDump I see lots of threads that look like this:

SCM polling for  
hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] / waiting  
for hudson.remoting.Channel@18af2b1:dops-afras
"SCM polling for  
hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] / waiting  
for hudson.remoting.Channel@18af2b1:dops-afras" Id=12349772 TIMED_WAITING  
on hudson.remoting.UserRequest@e8c7b6
        at java.lang.Object.wait(Native Method)
        -  waiting on hudson.remoting.UserRequest@e8c7b6
        at hudson.remoting.Request.call(Request.java:122)
        at hudson.remoting.Channel.call(Channel.java:547)
        at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
        at hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067)
        at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
        at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
        at  
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at  
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
        at  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

        Number of locked synchronizers = 1
        - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689


Unfortunately the slave dops-afras was offline, so it is going to wait a  
long time.  Even bringing it back online does not cause these waits to end.

Is this a bug?  It looks like there is something in the polling that is  
prepared to wait an infinite amount of time.  That can't be good.

-P

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Polling hung

Dean Yu
Hi Paul,
  Your email is timely. We've been seeing instances of hung SCM polls as
well, and I've been looking into it. First, to answer the easy part:

> Unfortunately it doesn't tell how to increase the number of threads.

  This is configured in Hudson's global configuration. Hudson tries to be
smart and only shows this configuration option to you if you have more than
a certain number of jobs, 10 I think.

  Polling operations used to be done on the master, but with version 1.5 of
the Subversion plugin, were moved to run on the slave that performed the
last build of a project. Remote polling is done synchronously, which means
that the channel between the master and slave is blocked until the polling
operation is complete.
  While a remote operation is running, the local Channel tracks the proxy
for the remote operation. If the connection between the master and slave
breaks down, the code should release all of its locks, allowing the next
operation to be sent. Of course, that next operation will fail because the
connection went down...
  It seems like in your case, you're stuck waiting for the response from the
slave, but you said the slave is offline? So somehow the slave went offline
without the Java code detecting that.

  -- Dean


On 1/26/10 11:50 AM, "Paul G. Weiss" <[hidden email]> wrote:

> I am seeing the following message on my hudson:
>
> There are more SCM polling activities scheduled than handled, so the
> threads are not keeping up with the demands. Check if your polling is
> hanging, and/or increase the number of threads if necessary.
>
> Unfortunately it doesn't tell how to increase the number of threads.
>
> However there appear to be a number of polling sessions that are hanging.
> When I do threadDump I see lots of threads that look like this:
>
> SCM polling for  
> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] / waiting
> for hudson.remoting.Channel@18af2b1:dops-afras
> "SCM polling for
> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] / waiting
> for hudson.remoting.Channel@18af2b1:dops-afras" Id=12349772 TIMED_WAITING
> on hudson.remoting.UserRequest@e8c7b6
> at java.lang.Object.wait(Native Method)
> -  waiting on hudson.remoting.UserRequest@e8c7b6
> at hudson.remoting.Request.call(Request.java:122)
> at hudson.remoting.Channel.call(Channel.java:547)
> at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
> at hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067)
> at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
> at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
> at  
> hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.j
> ava:118)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at  
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java
> :885)
> at  
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907>
)

> at java.lang.Thread.run(Thread.java:619)
>
> Number of locked synchronizers = 1
> - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689
>
>
> Unfortunately the slave dops-afras was offline, so it is going to wait a
> long time.  Even bringing it back online does not cause these waits to end.
>
> Is this a bug?  It looks like there is something in the polling that is
> prepared to wait an infinite amount of time.  That can't be good.
>
> -P
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Re: Polling hung

Matthew.Webber
We also had a similar failure this morning. At that point we were
running 1.341 (I've since upgraded to 1.342); this is all Linux, with a
Subversion 1.4 (old) repository.

I killed all the slave processes, but the slaves still showed as active.
I ended up killing the master as well, and then restarting.

Your comments are useful, as they give me some pointers on what to look
for, should this happen again.

Thanks,
Matthew


> -----Original Message-----
> From: Dean Yu [mailto:[hidden email]]
> Sent: 26 January 2010 20:36
> To: [hidden email]; Paul G. Weiss
> Subject: Re: Polling hung
>
> Hi Paul,
>   Your email is timely. We've been seeing instances of hung
> SCM polls as well, and I've been looking into it. First, to
> answer the easy part:
>
> > Unfortunately it doesn't tell how to increase the number of threads.
>
>   This is configured in Hudson's global configuration. Hudson
> tries to be smart and only shows this configuration option to
> you if you have more than a certain number of jobs, 10 I think.
>
>   Polling operations used to be done on the master, but with
> version 1.5 of the Subversion plugin, were moved to run on
> the slave that performed the last build of a project. Remote
> polling is done synchronously, which means that the channel
> between the master and slave is blocked until the polling
> operation is complete.
>   While a remote operation is running, the local Channel
> tracks the proxy for the remote operation. If the connection
> between the master and slave breaks down, the code should
> release all of its locks, allowing the next operation to be
> sent. Of course, that next operation will fail because the
> connection went down...
>   It seems like in your case, you're stuck waiting for the
> response from the slave, but you said the slave is offline?
> So somehow the slave went offline without the Java code
> detecting that.
>
>   -- Dean
>
>
> On 1/26/10 11:50 AM, "Paul G. Weiss" <[hidden email]> wrote:
>
> > I am seeing the following message on my hudson:
> >
> > There are more SCM polling activities scheduled than
> handled, so the
> > threads are not keeping up with the demands. Check if your
> polling is
> > hanging, and/or increase the number of threads if necessary.
> >
> > Unfortunately it doesn't tell how to increase the number of threads.
> >
> > However there appear to be a number of polling sessions
> that are hanging.
> > When I do threadDump I see lots of threads that look like this:
> >
> > SCM polling for
> > hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
> > waiting for hudson.remoting.Channel@18af2b1:dops-afras
> > "SCM polling for
> > hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
> > waiting for hudson.remoting.Channel@18af2b1:dops-afras" Id=12349772
> > TIMED_WAITING on hudson.remoting.UserRequest@e8c7b6
> > at java.lang.Object.wait(Native Method)
> > -  waiting on hudson.remoting.UserRequest@e8c7b6
> > at hudson.remoting.Request.call(Request.java:122)
> > at hudson.remoting.Channel.call(Channel.java:547)
> > at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
> > at
> >
> hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067)
> > at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
> > at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
> > at
> >
> hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutio
> > nQueue.j
> > ava:118)
> > at
> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441
> > ) at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
> > tor.java
> > :885)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE
> xecutor.java:907>
> )
> > at java.lang.Thread.run(Thread.java:619)
> >
> > Number of locked synchronizers = 1
> > - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689
> >
> >
> > Unfortunately the slave dops-afras was offline, so it is
> going to wait
> > a long time.  Even bringing it back online does not cause
> these waits to end.
> >
> > Is this a bug?  It looks like there is something in the
> polling that
> > is prepared to wait an infinite amount of time.  That can't be good.
> >
> > -P
> >

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Polling hung

Dean Yu
Hi Matthew,
  I was hoping that restarting the slaves would be enough to get the polling
unstuck -- we have thousands of jobs so restarting the master can take over
an hour. It sounds like you're saying it doesn't help? How did you kill the
slaves?

  -- Dean


On 1/26/10 12:48 PM, "[hidden email]"
<[hidden email]> wrote:

> We also had a similar failure this morning. At that point we were
> running 1.341 (I've since upgraded to 1.342); this is all Linux, with a
> Subversion 1.4 (old) repository.
>
> I killed all the slave processes, but the slaves still showed as active.
> I ended up killing the master as well, and then restarting.
>
> Your comments are useful, as they give me some pointers on what to look
> for, should this happen again.
>
> Thanks,
> Matthew
>
>
>> -----Original Message-----
>> From: Dean Yu [mailto:[hidden email]]
>> Sent: 26 January 2010 20:36
>> To: [hidden email]; Paul G. Weiss
>> Subject: Re: Polling hung
>>
>> Hi Paul,
>>   Your email is timely. We've been seeing instances of hung
>> SCM polls as well, and I've been looking into it. First, to
>> answer the easy part:
>>
>>> Unfortunately it doesn't tell how to increase the number of threads.
>>
>>   This is configured in Hudson's global configuration. Hudson
>> tries to be smart and only shows this configuration option to
>> you if you have more than a certain number of jobs, 10 I think.
>>
>>   Polling operations used to be done on the master, but with
>> version 1.5 of the Subversion plugin, were moved to run on
>> the slave that performed the last build of a project. Remote
>> polling is done synchronously, which means that the channel
>> between the master and slave is blocked until the polling
>> operation is complete.
>>   While a remote operation is running, the local Channel
>> tracks the proxy for the remote operation. If the connection
>> between the master and slave breaks down, the code should
>> release all of its locks, allowing the next operation to be
>> sent. Of course, that next operation will fail because the
>> connection went down...
>>   It seems like in your case, you're stuck waiting for the
>> response from the slave, but you said the slave is offline?
>> So somehow the slave went offline without the Java code
>> detecting that.
>>
>>   -- Dean
>>
>>
>> On 1/26/10 11:50 AM, "Paul G. Weiss" <[hidden email]> wrote:
>>
>>> I am seeing the following message on my hudson:
>>>
>>> There are more SCM polling activities scheduled than
>> handled, so the
>>> threads are not keeping up with the demands. Check if your
>> polling is
>>> hanging, and/or increase the number of threads if necessary.
>>>
>>> Unfortunately it doesn't tell how to increase the number of threads.
>>>
>>> However there appear to be a number of polling sessions
>> that are hanging.
>>> When I do threadDump I see lots of threads that look like this:
>>>
>>> SCM polling for
>>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
>>> waiting for hudson.remoting.Channel@18af2b1:dops-afras
>>> "SCM polling for
>>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
>>> waiting for hudson.remoting.Channel@18af2b1:dops-afras" Id=12349772
>>> TIMED_WAITING on hudson.remoting.UserRequest@e8c7b6
>>> at java.lang.Object.wait(Native Method)
>>> -  waiting on hudson.remoting.UserRequest@e8c7b6
>>> at hudson.remoting.Request.call(Request.java:122)
>>> at hudson.remoting.Channel.call(Channel.java:547)
>>> at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
>>> at
>>>
>> hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067)
>>> at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
>>> at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
>>> at
>>>
>> hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutio
>>> nQueue.j
>>> ava:118)
>>> at
>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441
>>> ) at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> at
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
>>> tor.java
>>> :885)
>>> at
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE
>> xecutor.java:907>
>> )
>>> at java.lang.Thread.run(Thread.java:619)
>>>
>>> Number of locked synchronizers = 1
>>> - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689
>>>
>>>
>>> Unfortunately the slave dops-afras was offline, so it is
>> going to wait
>>> a long time.  Even bringing it back online does not cause
>> these waits to end.
>>>
>>> Is this a bug?  It looks like there is something in the
>> polling that
>>> is prepared to wait an infinite amount of time.  That can't be good.
>>>
>>> -P
>>>
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the addressee
> please notify us of receipt by returning the e-mail and do not use, copy,
> retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not
> necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments
> are free from viruses and we cannot accept liability for any damage which you
> may sustain as a result of software viruses which may be transmitted in or
> with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and
> Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Polling hung

Matthew.Webber
I have a script that ssh's to each slave in turn, and kills the slave
process (my slaves run on a detached screen session). I was able to kill
the slave processes ok, but the master still thought they were active. I
ended up killing the master as well, and then restarting everything.

Over an hour to start sounds wrong to me (though I don't have thousands
of jobs). There have been some threads lately on improving start-up
times; are you aware if those?

Matthew

> -----Original Message-----
> From: Dean Yu [mailto:[hidden email]]
> Sent: 26 January 2010 20:55
> To: [hidden email]; Webber, Matthew (DLSLtd,RAL,DIA)
> Subject: Re: Polling hung
>
> Hi Matthew,
>   I was hoping that restarting the slaves would be enough to
> get the polling unstuck -- we have thousands of jobs so
> restarting the master can take over an hour. It sounds like
> you're saying it doesn't help? How did you kill the slaves?
>
>   -- Dean
>
>
> On 1/26/10 12:48 PM, "[hidden email]"
> <[hidden email]> wrote:
>
> > We also had a similar failure this morning. At that point we were
> > running 1.341 (I've since upgraded to 1.342); this is all
> Linux, with
> > a Subversion 1.4 (old) repository.
> >
> > I killed all the slave processes, but the slaves still
> showed as active.
> > I ended up killing the master as well, and then restarting.
> >
> > Your comments are useful, as they give me some pointers on what to
> > look for, should this happen again.
> >
> > Thanks,
> > Matthew
> >
> >
> >> -----Original Message-----
> >> From: Dean Yu [mailto:[hidden email]]
> >> Sent: 26 January 2010 20:36
> >> To: [hidden email]; Paul G. Weiss
> >> Subject: Re: Polling hung
> >>
> >> Hi Paul,
> >>   Your email is timely. We've been seeing instances of
> hung SCM polls
> >> as well, and I've been looking into it. First, to answer the easy
> >> part:
> >>
> >>> Unfortunately it doesn't tell how to increase the number
> of threads.
> >>
> >>   This is configured in Hudson's global configuration.
> Hudson tries
> >> to be smart and only shows this configuration option to you if you
> >> have more than a certain number of jobs, 10 I think.
> >>
> >>   Polling operations used to be done on the master, but
> with version
> >> 1.5 of the Subversion plugin, were moved to run on the slave that
> >> performed the last build of a project. Remote polling is done
> >> synchronously, which means that the channel between the master and
> >> slave is blocked until the polling operation is complete.
> >>   While a remote operation is running, the local Channel
> tracks the
> >> proxy for the remote operation. If the connection between
> the master
> >> and slave breaks down, the code should release all of its locks,
> >> allowing the next operation to be sent. Of course, that next
> >> operation will fail because the connection went down...
> >>   It seems like in your case, you're stuck waiting for the
> response
> >> from the slave, but you said the slave is offline?
> >> So somehow the slave went offline without the Java code detecting
> >> that.
> >>
> >>   -- Dean
> >>
> >>
> >> On 1/26/10 11:50 AM, "Paul G. Weiss" <[hidden email]> wrote:
> >>
> >>> I am seeing the following message on my hudson:
> >>>
> >>> There are more SCM polling activities scheduled than
> >> handled, so the
> >>> threads are not keeping up with the demands. Check if your
> >> polling is
> >>> hanging, and/or increase the number of threads if necessary.
> >>>
> >>> Unfortunately it doesn't tell how to increase the number
> of threads.
> >>>
> >>> However there appear to be a number of polling sessions
> >> that are hanging.
> >>> When I do threadDump I see lots of threads that look like this:
> >>>
> >>> SCM polling for
> >>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
> >>> waiting for hudson.remoting.Channel@18af2b1:dops-afras
> >>> "SCM polling for
> >>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
> >>> waiting for hudson.remoting.Channel@18af2b1:dops-afras"
> Id=12349772
> >>> TIMED_WAITING on hudson.remoting.UserRequest@e8c7b6
> >>> at java.lang.Object.wait(Native Method)
> >>> -  waiting on hudson.remoting.UserRequest@e8c7b6
> >>> at hudson.remoting.Request.call(Request.java:122)
> >>> at hudson.remoting.Channel.call(Channel.java:547)
> >>> at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
> >>> at
> >>>
> >>
> hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067
> >> )
> >>> at
> hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
> >>> at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
> >>> at
> >>>
> >>
> hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecuti
> >> o
> >>> nQueue.j
> >>> ava:118)
> >>> at
> >>>
> >>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
> >> 1
> >>> ) at
> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>> at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> >> u
> >>> tor.java
> >>> :885)
> >>> at
> >>>
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE
> >> xecutor.java:907>
> >> )
> >>> at java.lang.Thread.run(Thread.java:619)
> >>>
> >>> Number of locked synchronizers = 1
> >>> - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689
> >>>
> >>>
> >>> Unfortunately the slave dops-afras was offline, so it is
> >> going to wait
> >>> a long time.  Even bringing it back online does not cause
> >> these waits to end.
> >>>
> >>> Is this a bug?  It looks like there is something in the
> >> polling that
> >>> is prepared to wait an infinite amount of time.  That
> can't be good.
> >>>
> >>> -P
> >>>
> >
 

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Polling hung

Dean Yu
> I have a script that ssh's to each slave in turn, and kills the slave
> process (my slaves run on a detached screen session). I was able to kill
> the slave processes ok, but the master still thought they were active. I
> ended up killing the master as well, and then restarting everything.

So you don't disconnect the slave from the Hudson UI? I wonder if that's why
the master still thinks they're active. Although you'd think an IOException
would sail up because you killed the process...

> Over an hour to start sounds wrong to me (though I don't have thousands
> of jobs). There have been some threads lately on improving start-up
> times; are you aware if those?

Yeah, I am. They don't seem to help. Getting a better profile of our
bottlenecks is on my to do list. But this would be the topic of another
thread. :)

  -- Dean

On 1/26/10 1:05 PM, "[hidden email]"
<[hidden email]> wrote:

> I have a script that ssh's to each slave in turn, and kills the slave
> process (my slaves run on a detached screen session). I was able to kill
> the slave processes ok, but the master still thought they were active. I
> ended up killing the master as well, and then restarting everything.
>
> Over an hour to start sounds wrong to me (though I don't have thousands
> of jobs). There have been some threads lately on improving start-up
> times; are you aware if those?
>
> Matthew
>
>> -----Original Message-----
>> From: Dean Yu [mailto:[hidden email]]
>> Sent: 26 January 2010 20:55
>> To: [hidden email]; Webber, Matthew (DLSLtd,RAL,DIA)
>> Subject: Re: Polling hung
>>
>> Hi Matthew,
>>   I was hoping that restarting the slaves would be enough to
>> get the polling unstuck -- we have thousands of jobs so
>> restarting the master can take over an hour. It sounds like
>> you're saying it doesn't help? How did you kill the slaves?
>>
>>   -- Dean
>>
>>
>> On 1/26/10 12:48 PM, "[hidden email]"
>> <[hidden email]> wrote:
>>
>>> We also had a similar failure this morning. At that point we were
>>> running 1.341 (I've since upgraded to 1.342); this is all
>> Linux, with
>>> a Subversion 1.4 (old) repository.
>>>
>>> I killed all the slave processes, but the slaves still
>> showed as active.
>>> I ended up killing the master as well, and then restarting.
>>>
>>> Your comments are useful, as they give me some pointers on what to
>>> look for, should this happen again.
>>>
>>> Thanks,
>>> Matthew
>>>
>>>
>>>> -----Original Message-----
>>>> From: Dean Yu [mailto:[hidden email]]
>>>> Sent: 26 January 2010 20:36
>>>> To: [hidden email]; Paul G. Weiss
>>>> Subject: Re: Polling hung
>>>>
>>>> Hi Paul,
>>>>   Your email is timely. We've been seeing instances of
>> hung SCM polls
>>>> as well, and I've been looking into it. First, to answer the easy
>>>> part:
>>>>
>>>>> Unfortunately it doesn't tell how to increase the number
>> of threads.
>>>>
>>>>   This is configured in Hudson's global configuration.
>> Hudson tries
>>>> to be smart and only shows this configuration option to you if you
>>>> have more than a certain number of jobs, 10 I think.
>>>>
>>>>   Polling operations used to be done on the master, but
>> with version
>>>> 1.5 of the Subversion plugin, were moved to run on the slave that
>>>> performed the last build of a project. Remote polling is done
>>>> synchronously, which means that the channel between the master and
>>>> slave is blocked until the polling operation is complete.
>>>>   While a remote operation is running, the local Channel
>> tracks the
>>>> proxy for the remote operation. If the connection between
>> the master
>>>> and slave breaks down, the code should release all of its locks,
>>>> allowing the next operation to be sent. Of course, that next
>>>> operation will fail because the connection went down...
>>>>   It seems like in your case, you're stuck waiting for the
>> response
>>>> from the slave, but you said the slave is offline?
>>>> So somehow the slave went offline without the Java code detecting
>>>> that.
>>>>
>>>>   -- Dean
>>>>
>>>>
>>>> On 1/26/10 11:50 AM, "Paul G. Weiss" <[hidden email]> wrote:
>>>>
>>>>> I am seeing the following message on my hudson:
>>>>>
>>>>> There are more SCM polling activities scheduled than
>>>> handled, so the
>>>>> threads are not keeping up with the demands. Check if your
>>>> polling is
>>>>> hanging, and/or increase the number of threads if necessary.
>>>>>
>>>>> Unfortunately it doesn't tell how to increase the number
>> of threads.
>>>>>
>>>>> However there appear to be a number of polling sessions
>>>> that are hanging.
>>>>> When I do threadDump I see lots of threads that look like this:
>>>>>
>>>>> SCM polling for
>>>>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
>>>>> waiting for hudson.remoting.Channel@18af2b1:dops-afras
>>>>> "SCM polling for
>>>>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
>>>>> waiting for hudson.remoting.Channel@18af2b1:dops-afras"
>> Id=12349772
>>>>> TIMED_WAITING on hudson.remoting.UserRequest@e8c7b6
>>>>> at java.lang.Object.wait(Native Method)
>>>>> -  waiting on hudson.remoting.UserRequest@e8c7b6
>>>>> at hudson.remoting.Request.call(Request.java:122)
>>>>> at hudson.remoting.Channel.call(Channel.java:547)
>>>>> at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
>>>>> at
>>>>>
>>>>
>> hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067
>>>> )
>>>>> at
>> hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
>>>>> at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
>>>>> at
>>>>>
>>>>
>> hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecuti
>>>> o
>>>>> nQueue.j
>>>>> ava:118)
>>>>> at
>>>>>
>>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
>>>> 1
>>>>> ) at
>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>> at
>>>>>
>>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>>>> u
>>>>> tor.java
>>>>> :885)
>>>>> at
>>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE
>>>> xecutor.java:907>
>>>> )
>>>>> at java.lang.Thread.run(Thread.java:619)
>>>>>
>>>>> Number of locked synchronizers = 1
>>>>> - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689
>>>>>
>>>>>
>>>>> Unfortunately the slave dops-afras was offline, so it is
>>>> going to wait
>>>>> a long time.  Even bringing it back online does not cause
>>>> these waits to end.
>>>>>
>>>>> Is this a bug?  It looks like there is something in the
>>>> polling that
>>>>> is prepared to wait an infinite amount of time.  That
>> can't be good.
>>>>>
>>>>> -P
>>>>>
>>>
>  
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the addressee
> please notify us of receipt by returning the e-mail and do not use, copy,
> retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not
> necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments
> are free from viruses and we cannot accept liability for any damage which you
> may sustain as a result of software viruses which may be transmitted in or
> with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and
> Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Polling hung

Dean Yu
In reply to this post by Matthew.Webber
After doing some more investigation, I think there is the opportunity for
the channel object to get locked forever, blocking any further operations.
I've filed this as http://issues.hudson-ci.org/browse/HUDSON-5413

  -- Dean


On 1/26/10 1:05 PM, "[hidden email]"
<[hidden email]> wrote:

> I have a script that ssh's to each slave in turn, and kills the slave
> process (my slaves run on a detached screen session). I was able to kill
> the slave processes ok, but the master still thought they were active. I
> ended up killing the master as well, and then restarting everything.
>
> Over an hour to start sounds wrong to me (though I don't have thousands
> of jobs). There have been some threads lately on improving start-up
> times; are you aware if those?
>
> Matthew
>
>> -----Original Message-----
>> From: Dean Yu [mailto:[hidden email]]
>> Sent: 26 January 2010 20:55
>> To: [hidden email]; Webber, Matthew (DLSLtd,RAL,DIA)
>> Subject: Re: Polling hung
>>
>> Hi Matthew,
>>   I was hoping that restarting the slaves would be enough to
>> get the polling unstuck -- we have thousands of jobs so
>> restarting the master can take over an hour. It sounds like
>> you're saying it doesn't help? How did you kill the slaves?
>>
>>   -- Dean
>>
>>
>> On 1/26/10 12:48 PM, "[hidden email]"
>> <[hidden email]> wrote:
>>
>>> We also had a similar failure this morning. At that point we were
>>> running 1.341 (I've since upgraded to 1.342); this is all
>> Linux, with
>>> a Subversion 1.4 (old) repository.
>>>
>>> I killed all the slave processes, but the slaves still
>> showed as active.
>>> I ended up killing the master as well, and then restarting.
>>>
>>> Your comments are useful, as they give me some pointers on what to
>>> look for, should this happen again.
>>>
>>> Thanks,
>>> Matthew
>>>
>>>
>>>> -----Original Message-----
>>>> From: Dean Yu [mailto:[hidden email]]
>>>> Sent: 26 January 2010 20:36
>>>> To: [hidden email]; Paul G. Weiss
>>>> Subject: Re: Polling hung
>>>>
>>>> Hi Paul,
>>>>   Your email is timely. We've been seeing instances of
>> hung SCM polls
>>>> as well, and I've been looking into it. First, to answer the easy
>>>> part:
>>>>
>>>>> Unfortunately it doesn't tell how to increase the number
>> of threads.
>>>>
>>>>   This is configured in Hudson's global configuration.
>> Hudson tries
>>>> to be smart and only shows this configuration option to you if you
>>>> have more than a certain number of jobs, 10 I think.
>>>>
>>>>   Polling operations used to be done on the master, but
>> with version
>>>> 1.5 of the Subversion plugin, were moved to run on the slave that
>>>> performed the last build of a project. Remote polling is done
>>>> synchronously, which means that the channel between the master and
>>>> slave is blocked until the polling operation is complete.
>>>>   While a remote operation is running, the local Channel
>> tracks the
>>>> proxy for the remote operation. If the connection between
>> the master
>>>> and slave breaks down, the code should release all of its locks,
>>>> allowing the next operation to be sent. Of course, that next
>>>> operation will fail because the connection went down...
>>>>   It seems like in your case, you're stuck waiting for the
>> response
>>>> from the slave, but you said the slave is offline?
>>>> So somehow the slave went offline without the Java code detecting
>>>> that.
>>>>
>>>>   -- Dean
>>>>
>>>>
>>>> On 1/26/10 11:50 AM, "Paul G. Weiss" <[hidden email]> wrote:
>>>>
>>>>> I am seeing the following message on my hudson:
>>>>>
>>>>> There are more SCM polling activities scheduled than
>>>> handled, so the
>>>>> threads are not keeping up with the demands. Check if your
>>>> polling is
>>>>> hanging, and/or increase the number of threads if necessary.
>>>>>
>>>>> Unfortunately it doesn't tell how to increase the number
>> of threads.
>>>>>
>>>>> However there appear to be a number of polling sessions
>>>> that are hanging.
>>>>> When I do threadDump I see lots of threads that look like this:
>>>>>
>>>>> SCM polling for
>>>>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
>>>>> waiting for hudson.remoting.Channel@18af2b1:dops-afras
>>>>> "SCM polling for
>>>>> hudson.model.FreeStyleProject@1995c9a[QRES-CCL-SUPERSAFE-RPM] /
>>>>> waiting for hudson.remoting.Channel@18af2b1:dops-afras"
>> Id=12349772
>>>>> TIMED_WAITING on hudson.remoting.UserRequest@e8c7b6
>>>>> at java.lang.Object.wait(Native Method)
>>>>> -  waiting on hudson.remoting.UserRequest@e8c7b6
>>>>> at hudson.remoting.Request.call(Request.java:122)
>>>>> at hudson.remoting.Channel.call(Channel.java:547)
>>>>> at hudson.scm.SubversionSCM.pollChanges(SubversionSCM.java:1054)
>>>>> at
>>>>>
>>>>
>> hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067
>>>> )
>>>>> at
>> hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
>>>>> at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
>>>>> at
>>>>>
>>>>
>> hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecuti
>>>> o
>>>>> nQueue.j
>>>>> ava:118)
>>>>> at
>>>>>
>>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
>>>> 1
>>>>> ) at
>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>> at
>>>>>
>>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>>>> u
>>>>> tor.java
>>>>> :885)
>>>>> at
>>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE
>>>> xecutor.java:907>
>>>> )
>>>>> at java.lang.Thread.run(Thread.java:619)
>>>>>
>>>>> Number of locked synchronizers = 1
>>>>> - java.util.concurrent.locks.ReentrantLock$NonfairSync@6fc689
>>>>>
>>>>>
>>>>> Unfortunately the slave dops-afras was offline, so it is
>>>> going to wait
>>>>> a long time.  Even bringing it back online does not cause
>>>> these waits to end.
>>>>>
>>>>> Is this a bug?  It looks like there is something in the
>>>> polling that
>>>>> is prepared to wait an infinite amount of time.  That
>> can't be good.
>>>>>
>>>>> -P
>>>>>
>>>
>  
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the addressee
> please notify us of receipt by returning the e-mail and do not use, copy,
> retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not
> necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments
> are free from viruses and we cannot accept liability for any damage which you
> may sustain as a result of software viruses which may be transmitted in or
> with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and
> Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Polling hung

remya
This post has NOT been accepted by the mailing list yet.
Hi All,

I am also facing the same issue . In my build system, i have 3 Linux machines as Jenkins slaves. And i am using git as source control manager. There are 4 jobs are monitoring the git repository within a particular interval of time. But In one job , the polling is failing and the polling log shows..

"Started on May 14, 2013 9:26:20 PM
Using strategy: Default
[poll] Last Built Revision: Revision 276d3cfa12aeabaded582ee29c47d652f52f90e8 (origin/master)
Fetching changes from the remote Git repositories
Fetching upstream changes from origin
Polling for changes in
Using strategy: Default
[poll] Last Built Revision: Revision 844c8e025537a057c4c5d42b5d81c52d28fdcb03 (origin/master)
Fetching changes from the remote Git repositories
Fetching upstream changes from origin
Polling for changes in
FATAL: remote file operation failed: /home/reguser1/build_node/cirrus_embedded_inc_build/nimble at hudson.remoting.Channel@6ce6f53e:build_labxvm0119
hudson.util.IOException2: remote file operation failed: /home/reguser1/build_node/cirrus_embedded_inc_build/nimble at hudson.remoting.Channel@6ce6f53e:build_labxvm0119
        at hudson.FilePath.act(FilePath.java:900)
        at hudson.FilePath.act(FilePath.java:877)
        at hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:706)
        at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:628)
        at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:356)
        at hudson.scm.SCM.poll(SCM.java:373)
        at org.jenkinsci.plugins.multiplescms.MultiSCM.compareRemoteRevisionWith(MultiSCM.java:91)
        at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:356)
        at hudson.scm.SCM.poll(SCM.java:373)
        at hudson.model.AbstractProject.pollWithWorkspace(AbstractProject.java:1542)
        at hudson.model.AbstractProject._poll(AbstractProject.java:1514)
        at hudson.model.AbstractProject.poll(AbstractProject.java:1448)
        at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:439)
        at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:468)
        at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Remote call on build_labxvm0119 failed
        at hudson.remoting.Channel.call(Channel.java:681)
        at hudson.FilePath.act(FilePath.java:893)
        ... 20 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2367)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
        at java.lang.StringBuffer.append(StringBuffer.java:237)
        at java.io.StringWriter.write(StringWriter.java:101)
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.showRevision(CliGitAPIImpl.java:332)
        at hudson.plugins.git.GitSCM.isRevExcluded(GitSCM.java:1799)
        at hudson.plugins.git.GitSCM.access$200(GitSCM.java:57)
        at hudson.plugins.git.GitSCM$1.invoke(GitSCM.java:734)
        at hudson.plugins.git.GitSCM$1.invoke(GitSCM.java:706)
        at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2387)
        at hudson.remoting.UserRequest.perform(UserRequest.java:118)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:326)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at hudson.remoting.Engine$1$1.run(Engine.java:58)
        ... 1 more
Done. Took 34 sec
No changes"


When i verified the manage jenkins option it shows
"There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands. Check if your polling is hanging, and/or increase the number of threads if necessary."

Please let me know if there is any solution for this...