|
Hi all,
Lately we see quite a lot of jobs (~10 %) that just abort without any intervention. Somebody else ever had similar problems?
No error message in the console output: [...] [partest] testing: [...]/run/reflection-constructormirror-nested-good.scala [ OK ] [partest] testing: [...]/files/run/viewtest.scala [ OK ] [partest] testing: [...]/files/run/reify_newimpl_20.scala [ OK ] Build was aborted Archiving artifacts Checking console output Email was triggered for: Aborted Sending email for trigger: Aborted The abort is not because of a timeout (build timeout plugin). The Jenkins logs say that the abort is due to an un-cougth InterruptedException, stack trace below. It always looks the same. I think the reason is an InterruptedException in master-slave communication. The slaves are connected over SSH using the "SSH Slaves Plugin". I don't think that the exception is caused by our testing tool - this is running on the client in another (JVM) process, so even if it quits with an InterruptedException, that should not abort the Jenkins build. Thanks for any pointers! Lukas Jenkins Log: INFO: scala-checkin #6609 aborted java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:146) at hudson.remoting.Channel.call(Channel.java:663) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy36.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861) at hudson.Launcher$ProcStarter.join(Launcher.java:345) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499) at hudson.model.Run.execute(Run.java:1488) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) |
|
Further observation: it seems to happen only when running multiple concurrent builds
of the same job on the same slave (but not when running multiple builds on separate slaves, at least it seems that way currently). On Saturday, July 28, 2012 3:04:41 PM UTC+2, Lukas Rytz wrote: Hi all, |
|
Well, that's unfortunately not the case. I changed our setup to never run builds of
the same job on the same machine in parallel, but the aborts still happen. Just less often. The aborts always come in batches. The last batch was 48 aborts at the same time, each producing the same message in the Jenkins log (see first post). I'm mostly wondering if no-one ever experienced this problem.. Lukas On Sunday, July 29, 2012 11:55:37 AM UTC+2, Lukas Rytz wrote: Further observation: it seems to happen only when running multiple concurrent builds |
|
Wild guess but are the builds happening on a Windows based slave and
is someone logging out whilst the builds are running? I've had problems in the past with this (its a thing you can get around by passing the right argument -- -Xrs I think from memory) Might be nowhere near the issue but just in case :) Cheers Richard. On Tue, Aug 14, 2012 at 8:41 PM, Lukas Rytz <[hidden email]> wrote: > Well, that's unfortunately not the case. I changed our setup to never run > builds of > the same job on the same machine in parallel, but the aborts still happen. > Just > less often. > > The aborts always come in batches. The last batch was 48 aborts at the same > time, > each producing the same message in the Jenkins log (see first post). > > I'm mostly wondering if no-one ever experienced this problem.. > > Lukas > > > On Sunday, July 29, 2012 11:55:37 AM UTC+2, Lukas Rytz wrote: >> >> Further observation: it seems to happen only when running multiple >> concurrent builds >> of the same job on the same slave (but not when running multiple builds on >> separate >> slaves, at least it seems that way currently). >> >> >> >> >> On Saturday, July 28, 2012 3:04:41 PM UTC+2, Lukas Rytz wrote: >>> >>> Hi all, >>> >>> >>> Lately we see quite a lot of jobs (~10 %) that just abort without any >>> intervention. >>> Somebody else ever had similar problems? >>> >>> No error message in the console output: >>> >>> [...] >>> [partest] testing: >>> [...]/run/reflection-constructormirror-nested-good.scala [ OK ] >>> [partest] testing: [...]/files/run/viewtest.scala [ OK ] >>> [partest] testing: [...]/files/run/reify_newimpl_20.scala [ OK ] >>> Build was aborted >>> Archiving artifacts >>> Checking console output >>> Email was triggered for: Aborted >>> Sending email for trigger: Aborted >>> >>> The abort is not because of a timeout (build timeout plugin). >>> The Jenkins logs say that the abort is due to an un-cougth >>> InterruptedException, stack trace >>> below. It always looks the same. >>> >>> I think the reason is an InterruptedException in master-slave >>> communication. The slaves are >>> connected over SSH using the "SSH Slaves Plugin". >>> >>> I don't think that the exception is caused by our testing tool - this is >>> running on the client in >>> another (JVM) process, so even if it quits with an InterruptedException, >>> that should not abort >>> the Jenkins build. >>> >>> >>> Thanks for any pointers! >>> Lukas >>> >>> >>> >>> Jenkins Log: >>> >>> INFO: scala-checkin #6609 aborted >>> java.lang.InterruptedException >>> at java.lang.Object.wait(Native Method) >>> at hudson.remoting.Request.call(Request.java:146) >>> at hudson.remoting.Channel.call(Channel.java:663) >>> at >>> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) >>> at $Proxy36.join(Unknown Source) >>> at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861) >>> at hudson.Launcher$ProcStarter.join(Launcher.java:345) >>> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82) >>> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) >>> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) >>> at >>> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717) >>> at hudson.model.Build$BuildExecution.build(Build.java:199) >>> at hudson.model.Build$BuildExecution.doRun(Build.java:160) >>> at >>> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499) >>> at hudson.model.Run.execute(Run.java:1488) >>> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) >>> at hudson.model.ResourceController.execute(ResourceController.java:88) >>> at hudson.model.Executor.run(Executor.java:236) >>> >>> >>> > |
|
On Tuesday, August 14, 2012 10:45:38 AM UTC+2, Richard Bywater wrote: Wild guess but are the builds happening on a Windows based slave and Thanks for the pointer! But that cannot be it - they are all linux slaves running with the SSH Slaves Plugin, and they are dedicated machines, nobody is interacting with them..
|
|
Maybe the machines are running out of
memory? I have heard of Linux killing random processes to release
memory.
On 14-08-12 10:48, Lukas Rytz wrote:
|
|
In reply to this post by Lukas Rytz
Hi. As I can see the solution not found yet? I have same problem with job interruption. Have anyone found the solution?
Thanks
|
|
Have you tried -Xrs parameter for JVM?
On Thursday, 11 October 2012 11:13:49 UTC+2, Evgeny Makarov wrote: Hi. As I can see the solution not found yet? I have same problem with job interruption. Have anyone found the solution? |
|
In reply to this post by Lukas Rytz
This is embarrassing, but let me post it for reference.
It seems the reason was a simple mis-configuration (we allowed anonymous users to cancel builds). The search engine crawlers were probably causing the aborts by visiting the "job/id/stop" links.
-- https://groups.google.com/d/topic/scala-internals/BaM3kqyJKu4/discussion It would have helped to get a bit of information in the build log output, or in the jenkins log, on the cause of the abort. On Saturday, July 28, 2012 3:04:41 PM UTC+2, Lukas Rytz wrote: Hi all, You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/groups/opt_out. |
| Powered by Nabble | Edit this page |
