|
recampbell created JENKINS-12271:
------------------------------------ Summary: Jenkins DoS's itself with ajax checks from job/*/configure with large workspace Key: JENKINS-12271 URL: https://issues.jenkins-ci.org/browse/JENKINS-12271 Project: Jenkins Issue Type: Bug Components: core Reporter: recampbell Priority: Critical Summary: A job with greedy glob expressions for archives, and a workspace with lots of directories will can bring Jenkins to it's knees due to all the background AJAX checks recursively checking the filesystem for matching files. The only resolution is to kill Jenkins and ask users to write less greedy expressions. Details: We had a Jenkins instance become unavailable due to 100% of CPU usage. There were several dozen requests that looked like this: "Handling GET /job/torquebox-2x-incremental/descriptorByName/hudson.tasks.ArtifactArchiver/checkArtifacts : http-8097-21" daemon prio=10 tid=0x00007ff3f001d800 nid=0x39ae runnable [0x00007ff3e2ae5000] java.lang.Thread.State: RUNNABLE at org.apache.tools.ant.util.VectorSet.doAdd(VectorSet.java:64) - locked <0x00007ff4415240b8> (a org.apache.tools.ant.util.VectorSet) at org.apache.tools.ant.util.VectorSet.addElement(VectorSet.java:75) - locked <0x00007ff4415240b8> (a org.apache.tools.ant.util.VectorSet) at org.apache.tools.ant.DirectoryScanner.scandir(DirectoryScanner.java:1236) at org.apache.tools.ant.DirectoryScanner.scandir(DirectoryScanner.java:1259) at org.apache.tools.ant.DirectoryScanner.scandir(DirectoryScanner.java:1259) ... The user was configuring a job which archived artifacts using the following glob expressions: integration-tests/target/rubygems, integration-tests/target/integ-dist/jboss/standalone/log/*.log,integration-tests/apps/**/log/development.log,**/target/surefire-reports/*.txt, **/target/rspec-report.html The workspace for this job has 37k subdirectories, each of which was being checked for a match against /target/surefire-reports/*.txt and /target/rspec-report.html The real problem is that these Ajax threads pile up and eventually eat all the CPU if you are navigating in and out of the field multiple times. Eventually, these threads eat up all available CPU and the instance becomes unavailable. It seems like ArtifactArchiver.doCheckArtifacts() and JUnitResultsArchiver.doCheckTestResults should use a Callable/Future.get(30, seconds). Maybe it should be 15 seconds; I can't imagine someone waiting longer for a validation. Perhaps the thread can also have a lower priority? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira |
|
|||||||||||
|
In reply to this post by JIRA noreply@jenkins-ci.org
|
|||||||||||||||||
| Powered by Nabble | Edit this page |

We have seen the exact same bug.
We haven't gotten the CPU to run out since the bug was found
on a machine with lots of cores but as soon as we go to the configure
page, the cpu for the slave process goes to 500% then slowly stabilizes at 100%
and stays there indefinitely.