Imagine this scenario. You're part of a small team that's been following the CDT closely and have adopted it as the IDE for your commercial platform. You grab the CDT source at times convenient to your product deliver schedule and work on a local copy fixing bugs you find as you go through product testing. You're not a committer but you do submit patches from time to time and hope that the CDT team picks them up. But they're often busy with their own delivery schedules and the patches often grow stale and fall off everyone's radar.
So you live with your CDT fork and struggle every time you have to update to a new CDT version, so you don't do that very often. And since you're busy struggling in that environment, you really don't end up with time to get more involved with the CDT. You are a small team and you only have so much time in the day. You run into Doug once in a while at the Eclipse conferences and talk about what you do and promise you'll figure out some way to get more involved, but he knows your story too well and doesn't put much faith in it despite his appreciate for your intentions.
Sounds like I have experience with this, don't I. This scenario is too real and I'd bet is very common across all open source projects. Relying on CVS and Subversion at Eclipse with access controls limited to the select few committers makes it very difficult for those on the fringes to get more involved. It truly is a have/have not environment. The committers have it easy, checking in their changes whenever they want and those that aren't are struggling to keep up, or simply fork and go their own direction.
I've learned that the new Symbian Foundation as selected Mercurial as their source control system. Along with Linus's git, it's one of the new breed of distributed source control systems. These systems allow for multiple repositories and provide mechanism to pull and push changes between them. The introduction chapter of the Mercurial on-line book provides a great description of why this architecture works well for large globally distributed projects.
I invite everyone to read it, especially the Eclipse community. Because I think we need this kind of capability now. CDT needs an infusion of new blood and I know there are a lot of people who work with the CDT code base but have only a limited time to contribute back. If we had the infrastructure to better support them and make it easier to pull their changes into the CDT main line, and easier for them to keep up with everyone else's changes, it could be the formula we need to grow.
Saturday, December 13, 2008
Subscribe to:
Post Comments (Atom)

15 comments:
Are you sure version control system is not just a minor detail? We have 60 local CDT patches, a sizeable fraction of which are generally applicable, and we'll be happy to contribute them. Can you promise timely review for them? I guess the answer does not depend much on what version control we're using internally.
I am glad that this issue is gaining visibility among Eclipse committers.
Vladimir, the main advantage of DVCSs is that they help you with merges so that the price you pay for waiting for reviews and integration is smaller.
Ismael,
in our case, we don't work off CDT HEAD, so mechanisms to quickly grab and merge 100 changes from upstream won't help. For each upgrade, we reapply our patches. Trivial patches are trivial to reapply, and even if I use some tool often described as 'distributed', that would be trivial using 'patch', too. Non-trivial patches require thinking and rework, and that's not something DVCS can help with.
This is a resources/community problem, technical solution won't help.
Vladimir, I am not familiar with your code or process so it's hard to say.
What I can say is that people in the Linux kernel (using Git) and OpenJDK (using Mercurial) have to deal with various trees where each tree gets a lot of commits on a daily basis and having a distributed DVCS is what makes it possible. Having to fix patches all the time would slow things down considerably.
But even as an individual contributor in open-source projects that are run by volunteers, it's not encouraging to submit a patch for something and then to have to resubmit again because it went stale a few weeks later. DVCSs mitigate this problem. It surely won't solve _all_ problems, but it helps the individual contributors that are not paid to work on related projects.
As alluded to earlier, it also helps contributors that track a branch or the tip. Finally, it also helps the developers of the project itself. There are few drawbacks, so why not do something that helps so many people?
To Vladimir's point, no, we can't guarantee timely review of patches for the CDT. All the committers place a higher priority on their product deliveries than on the CDT itself. There isn't that much difference from what you do at CodeSourcery with no committers than we do at Wind River with a small handful of committers. The only difference is that, having committers, we don't have to do patches which gets us a better chance of getting our changes in.
There is only a certain window during the year when the stars align and we can get people looking at the patches. That's why something that allows us to apply patches using a 3-way merge is the best approach.
Doug,
it sounds like you're saying that patches get reviewed and applied once per year. If that's the case, then no version control system will help. Although no open-source project guarantees a review in specific timeframe, in practice review is relatively quick. If, in the contrast, I know that my patch *might* be reviewed in a year, and not exactly sure it *will* be reviewed, it makes submitting a patch slightly useless.
The second paragraph of your original post seem to put the blame on those who extend CDT.
At least, "promise... get more involved" sounds this way. What kind of involvement do you mean and what would you suggest? Dumping 50 patches into bugzilla? I can do that, but it would be hard to justify this effort to my management if I have no idea when those patches will be reviewed, if at all.
You do not have to wait for Eclipse. If the CDT repository was in Subversion then people could use git-svn or hgsubversion to use Git/mercurial to maintain their own forks and patches as well as push them back. Lots of OSS projects do this already.
I am not sure if it can be done for CVS, but it would not be surprising if you can.
Of the two, git-svn is probably more mature and gets more usage. You can even use places like github to post the git repository and make it easy for people to clone it.
Actually, it looks like someone has already done this for CDT:
http://github.com/cdt/mirror/tree/master
It looks out of date though. I guess there means that there is also a git-cvs tool that already exists.
Doug, I didn't even go to EclipseCon this year ... so I'm sure you're not talking about me =;-)
Although git and mercurial are all the rage, I`ve lived a trouble free life using Bazaar (bzr) on multiple platforms.
Vladimir, dump 50 patches into bugzilla, get 10 of them reviewed and committed, we'll make you a committer...
The time is definitely now.
Is it possible to trace that contributers' code agrees with the Eclipse Public License though when getting diffs through merge or rebase? Submitting patches on BugZilla guarentees that right now.
I'm just asking because I requested support for Git for my project Glimmer, and that question was raised.
How about having this discussion on https://bugs.eclipse.org/bugs/show_bug.cgi?id=257706 ?
Certainly the first paragraph applies to us. We've got an internal CDT release which I support and maintain and it has a good handful of fixes/improvements.
What I've found is the merge effort (from CDT 4 to 5, for example) was painful. The result was I became more determined to push patches upstream. Yes there is a time cost in terms of following up bugzilla, cajoling committers into taking a look, etc. But in my opinion this is much less work than the deferred merge effort would have been. Moreover by taking an active interest in bugs and discussing fixes etc. we hopefully help prevent CDT from stagnating.
What I've also found is that, with fewer active committers, CDT HEAD moves quite slowly. As a result I follow it more closely which results in:
- immediate benefits when bugs are fixed
- our patches are closer to head
- the resulting product is better than if we stuck to released milestones.
But back to the topic of DSC. I've recently discovered the wonders of git, and I can track the CDT repository with a very simple 'git cvs-import ...' The result is that I can keep larger patches -- such as the project storage work -- on a git branch, let git do the merge work for me and very easily regenerate patches for bugzilla. It's now much easier to ensure that patches don't go cold!
Perhaps I've been lucky in that as I've understood more of Eclipse/CDT, and my patches improved, I've not had much issue getting them applied. I think there's great value in having disparate contributors, some with a vested interest in commercial IDEs plus some who rely on the base cdt product. If many people are willing to invest a little effort, the sum is greater than the component parts. We all benefit from a platform used by many groups for different purposes.
Vladimir what the platform needs is more people like yourself getting involved. It'd be great if more people commented on bugs, proposed fixes, etc. Certainly the one way to guarantee your patches don't make it into CDT is to not submit them in the first place.
JamesB, I am sure you did search Eclipse bugzilla for all my bugs, and checked which have patches? A few have, and a reaction time of a year is typical. And these are fairly easy patches.
Doug,
this sounds like a plan. I'll work of submitting some further patches -- hopefully there will be better luck with them.
I don't have much experience with contributing to projects, but I've found that even DVCS don't really help with committing patches upstream. While it's much easier to maintain the downstream version (just pull changes), once you need to push the changes upstream, you'll start running into problems (what changes did we do? How to get them from the branch history and make them apply cleanly to the head? I only tried this once, generating a patch from SVK mirror, using svn diff... and all of the hunks got rejected by patch, so I had to reapply them manually anyway).
I haven't tried git (although I hear that its windows support is getting better over time), but the only sensible thing I've encountered until now is Mercurial Queues (themselves not really a revolutionary idea, being inspired by Quilt), git probably has something similar.
The only thing that bothers me is that now that people actually start using DVCSs, the best thing still seems to be a patch queue on top of the VCS. I always thought that this was the problem that systems like git, hg or bzr wanted to solve... Or is it because many projects use bugzilla, and actually have to work through patches, thereby losing one of the advantages of DVCS, namely being able to directly pull changes from a branch?
@Vladimir: excellent. Thanks. And the good news is that we're about to enter that time of year after people are done their product releases and focus on the CDT.
@Annas: We do need to figure out the IP issues around submitting fixes with DVCSs, but I don't think it's insurmountable. At the very least, you can generate diffs with hg.
What I'd like to see is an area where contributors can put their repos without being subject to EPL restrictions while making them readily available to the upstream committers.
Post a Comment