|
Hi Jenkins gurus,
I have a load of jobs (50+ I think) which clone the same repository, but different branches, to build/unit/test/functional test stages. Also, it's a special application of the "job splitting pattern" (https://wiki.jenkins-ci.org/display/JENKINS/Splitting+a+big+job+into+smaller+jobs):
the tarball that downstream jobs receive is a much smaller than the entire workspace: it only contains unknown files(git ls-files -oz: the build artifacts), which is "just" 400m
vs 1.8G. Downstream jobs unpack this on top of a pristine clone to get up to speed. This is quite fast (most files are there already) and also seems to do better change tracking. However it costs space - each of the workspace is ~ 4-5G - half of which is the git clone.
While git has a good reason to clone everything with all the branches, I don't need that duplicated 50 times on the Jenkins box. So am wondering if there is a way to optimise this? I guess, i'd rather have one single full clone, and let jobs have the work directories (+index?)..
Any enlightments/alternative ideas are appreciated. thanks, Gergo |
|
The git plugin rework discussions mentioned the possibility of including the "---reference <existing-repository>" argument to git clone so the pack files for a single repository could be reused in multiple repositories on the same machine. Then you could clone to a single directory on the slave, and reference that clone rather than copying the pack files to each of the workspace copies. I don't think it has been implemented yet, but the plugin developers may be willing to share their ideas in case they have an even better idea than using the --reference argument to git clone. Mark Waite
|
|
Thanks Mark,
that's great info - to me it sounds like the way to go.
Gergo On Wed, Feb 15, 2012 at 3:03 AM, Mark Waite <[hidden email]> wrote:
|
|
You can already achieve the same benefit by making a local clone of the git repo (use --bare for this) and then configuring each job to have 2 repos: the first should be /path/to/local/repo and the second can be the location where you usually clone from. This way most git objects will be shared because a local git clone will use hard links. My build slaves at work have small but fast ssd disks and we use this trick (plus running git clean -fxd as a post-task step) to keep disk space usage in control. -- Sami
|
|
That is brilliant! I wish I'd thought of that before, since that will likely save me lots of disc space on our slave nodes. Mark Waite
|
|
In reply to this post by Sami Tikka
Sami, .. so I was already using "local remote urls"... however that mirror is on a different partition (LV). .. which breaks the hard links, I assume.
So thanks for the reminder - that will help on the master and linux slaves... BTW, you'd have one clone for each slave right? Also, is it a recommended practice to set up a Jenkins job solely to maintain this clone?
If so, it would be nice to have plugin support (e.g a wizard to add "mirrored git clones" across master/slaves, and SCM type "Git local clone") - maybe I could start something like that, unless someone has already:)
Still, a more general underlying mechanism would be great, e.g. one that even works on Windows slaves.. the object sharing would not work the same way (at least not with hard links - I assume). Thank you, Gergo On Mon, Feb 20, 2012 at 11:51 PM, Sami Tikka <[hidden email]> wrote:
|
|
In reply to this post by Gergely N
Actually the --reference flag is supported by the git plugin (v1.1.15) and so far it is working nicely!
|
|
In reply to this post by Gergely N
Hard links do not work across file systems. And I need to have a local
clone on every slave. And I do not actually maintain this clone. My slaves are actually virtual machines and I recreate them regularly and the final step of the vm setup is to clone the git repo. The local clone doesn't really need to be very up-to-date to provide big savings. If your slaves are persistent, then sure, you'd want to update the local clone every now and then, maybe once a week or so. I do believe I've heard that windows git supports hard links. But don't take my word for it. I've never used it myself. -- Sami 2012/2/21 Gergely Nagy <[hidden email]>: > Sami, > > Yes, I had been thinking about that too - but I actually already have a > gitolite mirror repo (+git-svn clone) on the master... > .. so I was already using "local remote urls"... however that mirror is on > a different partition (LV). .. which breaks the hard links, I assume. > > So thanks for the reminder - that will help on the master and linux > slaves... BTW, you'd have one clone for each slave right? > Also, is it a recommended practice to set up a Jenkins job solely to > maintain this clone? > If so, it would be nice to have plugin support (e.g a wizard to add > "mirrored git clones" across master/slaves, and SCM type "Git local clone") > - maybe I could start something like that, unless someone has already:) > > Still, a more general underlying mechanism would be great, e.g. one that > even works on Windows slaves.. the object sharing would not work the same > way (at least not with hard links - I assume). > > Thank you, > Gergo > > > On Mon, Feb 20, 2012 at 11:51 PM, Sami Tikka <[hidden email]> wrote: >> >> You can already achieve the same benefit by making a local clone of the >> git repo (use --bare for this) and then configuring each job to have 2 >> repos: the first should be /path/to/local/repo and the second can be the >> location where you usually clone from. >> >> This way most git objects will be shared because a local git clone will >> use hard links. >> >> My build slaves at work have small but fast ssd disks and we use this >> trick (plus running git clean -fxd as a post-task step) to keep disk space >> usage in control. >> >> -- Sami >> >> Gergely Nagy <[hidden email]> kirjoitti 15.2.2012 kello 19.15: >> >> Thanks Mark, >> that's great info - to me it sounds like the way to go. >> Gergo >> >> On Wed, Feb 15, 2012 at 3:03 AM, Mark Waite <[hidden email]> wrote: >>> >>> The git plugin rework discussions mentioned the possibility of including >>> the "---reference <existing-repository>" argument to git clone so the pack >>> files for a single repository could be reused in multiple repositories on >>> the same machine. Then you could clone to a single directory on the slave, >>> and reference that clone rather than copying the pack files to each of the >>> workspace copies. >>> >>> I don't think it has been implemented yet, but the plugin developers may >>> be willing to share their ideas in case they have an even better idea than >>> using the --reference argument to git clone. >>> >>> Mark Waite >>> >>> From: Gergely Nagy <[hidden email]> >>> To: [hidden email] >>> Sent: Tuesday, February 14, 2012 1:23 PM >>> Subject: git: reduce clones' disk space >>> >>> Hi Jenkins gurus, >>> >>> I have a load of jobs (50+ I think) which clone the same repository, but >>> different branches, to build/unit/test/functional test stages. >>> >>> Also, it's a special application of the "job splitting pattern" >>> (https://wiki.jenkins-ci.org/display/JENKINS/Splitting+a+big+job+into+smaller+jobs): >>> the tarball that downstream jobs receive is a much smaller than the >>> entire workspace: it only contains unknown files(git ls-files -oz: the build >>> artifacts), which is "just" 400m >>> vs 1.8G. Downstream jobs unpack this on top of a pristine clone to get up >>> to speed. This is quite fast (most files are there already) and also seems >>> to do better change tracking. >>> >>> However it costs space - each of the workspace is ~ 4-5G - half of which >>> is the git clone. >>> While git has a good reason to clone everything with all the branches, I >>> don't need that duplicated 50 times on the Jenkins box. >>> So am wondering if there is a way to optimise this? >>> I guess, i'd rather have one single full clone, and let jobs have the >>> work directories (+index?).. >>> >>> Any enlightments/alternative ideas are appreciated. >>> thanks, >>> Gergo >>> >>> >>> >>> >>> >> > |
|
In reply to this post by Patrick Renaud
Patrick Renaud mentioned that the git plugin version 1.1.15 has a feature I want. That sounds great. Unfortunately, I can't see that version from the update center on my server.
I can see that version from the update center on another server within the same network, and I can use wget from the problem server to download the update center JSON file. Any hints on what might be preventing me from seeing the latest updates to Jenkins plugins on my Jenkins 1.451 installation on Debian? Thanks, Mark Waite |
| Powered by Nabble | See how NAML generates this page |
