# [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton created JENKINS-15331
 Issue Type: Improvement Assignee: Unassigned Components: core Created: 27/Sep/12 10:09 AM Description: Please enhance the hudson.Util.deleteContentsRecursive method to: delete everything it can try several times to delete everything only throw an exception if it can't delete everything (listing everything that it can't delete) Reasoning... Unlike unix, the Microsoft Windows OS does not allow a file to be deleted if something has that file open. This causes delete operations to fail. Furthermore, most installations of Windows have software that monitors the filesystem for activity and then inspects the contents of recently added/removed files (which means that it'll lock them, albeit temporarily), e.g. the Windows Search service & anti-virus software to name but two (but Windows Vista & Windows 7 seem to have additional complications) This means that builds which rely on cleaning a workspace before they start will sometimes fail (claiming that they couldn't delete everything because a file was locked), resulting in a build failing with the following output: Started by an SCM change Building remotely on jenkinsslave27 in workspace C:\hudsonSlave\workspace\MyProject Purging workspace... hudson.util.IOException2: remote file operation failed: C:\hudsonSlave\workspace\MyProject at hudson.remoting.Channel@6f0564d7:jenkinsslave27 at hudson.FilePath.act(FilePath.java:835) at hudson.FilePath.act(FilePath.java:821) at hudson.plugins.accurev.AccurevSCM.checkout(AccurevSCM.java:331) at hudson.model.AbstractProject.checkout(AbstractProject.java:1218) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:586) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:475) at hudson.model.Run.run(Run.java:1434) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:239) Caused by: java.io.IOException: Unable to delete C:\hudsonSlave\workspace\MyProject\...\src\...\foo - files in dir: [C:\hudsonSlave\workspace\MyProject\...\src\...\foo\bar] at hudson.Util.deleteFile(Util.java:236) at hudson.Util.deleteRecursive(Util.java:287) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.Util.deleteRecursive(Util.java:278) at hudson.Util.deleteContentsRecursive(Util.java:198) at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:28) at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:11) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2161) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:287) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1$1.run(Engine.java:60) at java.lang.Thread.run(Unknown Source)  What's needed is a retry mechanism. i.e. the equivalent of using Ant's , but with a (small) delay between attempts (and maybe a call to the garbage collector, just in case the process holding the file open is the build slave process itself). Environment: Microsoft Windows Project: Jenkins Priority: Minor Reporter: pjdarton
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton commented on JENKINS-15331
 Note: This file locking behavior also causes non-Jenkins issues, e.g. deleting multiple folders using Windows explorer will sometimes leave one (usually empty) folder behind, and even a simple "RD /S /Q MyFolder" will sometimes fail to delete the folder on its first attempt. In these cases, simply retrying the operation will succeed. Personally, I think it's a Windows "feature". As a workaround, I've wrapped most of my calls to Ant's task in , and this has eliminated this problem from any of my builds that manage to start BUT this doesn't help if Jenkins doesn't get as far as running my builds. e.g. I'm using the accurev plugin for my SCM and it cleans the working directory before it grabs the source - I typically get about a 1% failure rate at this stage. Whilst 1% is not a blocking issue, it's not reliable, which is not what one wants from a build system. Personally, I've found that excluding the build areas from Search & anti-virus helps reduce the problem, but it is insufficient to stop these failures completely (at least on Windows 7) - something, somewhere, will still lock files, sometimes, but any investigation (after the build has failed failed) shows that no process has the file "open".
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton commented on JENKINS-15331
 Added pull request https://github.com/jenkinsci/jenkins/pull/583 Features: Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms). Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b"). Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete. I have high hopes that this will mean that I will no longer suffer from JENKINS-15331 (possibly with a higher retry count, as Windows can take a dozen attempts to delete a file sometimes - I've set the defaults to mimic existing behavior). Disclaimers: It's undergone very little testing. Building the master branch failed consistently on my PC at the time, but it continued to fail in exactly the same way once I'd done my changes. It's entirely possible that the changes will fail to fix the problem as I've been unable to build & run Jenkins from this code. To compensate, I have added some additional unit-tests that give me confidence that, at the very least, the changes should do no harm. Although the GitHub Windows client is very pretty, it looks like it has also asked the GitHub server to change all the end-of-line codes in a manner that makes GitHub think I've changed the entire file, which isn't true - when pressed the publish button on my PC, it wasn't changing the entire file, it only looked like that immediately afterwards. As a result, it may be necessary to change the EOLs back again. (One day, I'll get the hang of using git from a Windows machine, but today was not that day...) If further testing is required, I'd be happy to accept a built Jenkins (that included my changes) to test. Normally I'd do that before issuing a "pull request", but as it doesn't seem to want to build at all on my PC here, I don't really have that option (if there are other ways of getting a Jenkins built, e.g. by getting another server to do the build from my GitHub repo, do let me know by email at [hidden email]).
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton started work on JENKINS-15331
 Change By: pjdarton (09/Nov/12 4:45 PM) Status: Open In Progress
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton updated JENKINS-15331
 Uploaded git patch file; this was produced using the git command-line and isn't claiming to change the entire file. This will probably be a lot easier to merge. This is my "New-and-improved" solution. In addition to retrying the deletes, this also calls System.gc() if it's on Windows (a tactic that's also used in Apache Ant's Delete task to workaround the same problem). Change By: pjdarton (09/Nov/12 4:56 PM) Attachment: 0001-JENKINS-15331.patch
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton commented on JENKINS-15331
 Have re-done my GitHub pull request to reflect the new changes (and to fix the CRLF issue with the previous pull request). New pull request is https://github.com/jenkinsci/jenkins/pull/615
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton commented on JENKINS-15331
 I've now been running the LTS Jenkins build (1.480.1) with this patch applied at work for a while. I've not seen any builds failing due to "file in use" since. I would therefore recommend that this patch / pull-request be incorporated into the main branch ASAP, and to the next LTS release.
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 Change By: Daniel Kirkdorffer (29/Jan/13 9:38 PM) Priority: Minor Major
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 Daniel Kirkdorffer commented on JENKINS-15331
 I believe this is also the root cause of JENKINS-15852. The Git Plugin has a call in GitAPI to FilePath.deleteRecursive(), which in turn calls Util.deleteRecursive(). It is almost immediately trying to delete a workspace that has just been created. Additionally, we have encryption and McAfee software monitoring files that could be locking them.
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton commented on JENKINS-15331
 File-locking is the bane of anyone running any kind of automated system on Windows, so I'd agree that this might well solve the problem, although you may find that what you actually want the Git plugin to do is to just delete the contents of the workspace, not the workspace folder itself, i.e. to call deleteContentsRecursive (which will give you an empty directory) instead of deleteRecursive (which will delete the folder itself). I also got anti-virus stuff running on my build slaves (I've managed to keep them free from whole-disk-encryption), and despite that I've not noticed any builds fail due to file-locking issues since I started running a custom build of Jenkins LTS that has this fix in it. I think that this amounts to a fair amount of circumstantial evidence that this fix works.
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton edited a comment on JENKINS-15331
 Features: Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms). Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b"). Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete. I have high hopes that this will mean that I will no longer suffer from JENKINS-15331 (possibly with a higher retry count, as Windows can take a dozen attempts to delete a file sometimes - I've set the defaults to mimic existing behavior). Disclaimers: I've not tested every possible place this is called from. To compensate, I have added some additional unit-tests that give me confidence that, at the very least, the changes should do no harm. Since posting this comment in October 2012, I've been running a version of Jenkins LTS with this patch applied. I've not had file locking problems since (i.e. for about 4 months now). I'm therefore pretty sure that it fixes the problem.
## [JIRA] (JENKINS-15331) Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

 pjdarton edited a comment on JENKINS-15331
 Features: Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms). Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b"). Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete. Added some unit tests for these methods. After posting this back in October 2012, I built a version of Jenkins LTS with this patch applied. I've been using it at work for all our development stuff and I've not had file locking problems since. I'm pretty confident that it fixes the problem. Disclaimers: I've not tested this on Linux (or the unit-tests). It should be harmless (behaviorial changes are conditional on being on Windows), but it'd be worth running the unit-tests on Linux just to verify that.
