|
what it takes to gain commit access to hadoop?
|
|
On 21 November 2012 15:03, Radim Kolar <[hidden email]> wrote:
> what it takes to gain commit access to hadoop? > good question. I've put some of my thoughts on the topic into a presentation I gave last month: http://www.slideshare.net/steve_l/inside-hadoopdev That isn't so much about commit/non-commit status, because it was more focused about getting your code in -which is normally what matters. Even committers have to go through RTC -you don't get any special privileges from commit rights, the task of keeping patches up to date and having to remind others to review it. The main "feature" is that when you get the +1 vote you yourself get to deal with the grunge work of apply patches to one or more svn branches, resyncing that with the git branches you inevitably do your own work on. And it also gives you more responsibilty to review and commit others' works, which is something that some of (I point to myself here) are lax at. I can't help wondering if we need to be a bit more formal about that too, have one day a month "review sundays"(*) where we do go through and review the outstanding works, so they don't fall by the wayside. -steve (*) Yes, a sunday. I know everyone is too busy to dedicate a weekday to this. |
|
> The main "feature" is that when you get the +1 vote you yourself get to deal with the grunge work of apply > patches to one or more svn branches, resyncing that with the git branches > you inevitably do your own work on. no, main feature is major speed advantage. It takes forever to get something committed. I was annoyed with apache nutch last year and forked it, here is snapshot from forked codebase http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439 now its 160k LOC on top of apache nutch 1.4. If i worked with these guys, it would be never done because it took them 4 months to get 200 lines patch reviewed. Hadoop has huge backlog of patches, you need way more committers then you have today. I simply could not assign person to working on hadoop fulltime because if he submits mere 5 patches per day, you will be never able to process them. Your current development process fail to scale. What are your plans for moving development faster? |
|
On 26 November 2012 21:25, Radim Kolar <[hidden email]> wrote:
> > The main "feature" is that when you get the +1 vote you yourself get to >> deal with the grunge work of apply >> patches to one or more svn branches, resyncing that with the git branches >> you inevitably do your own work on. >> > no, main feature is major speed advantage. It takes forever to get > something committed. I was annoyed with apache nutch last year and forked > it, here is snapshot from forked codebase http://forum.lupa.cz/index.** > php?action=dlattach;topic=**1674.0;attach=3439<http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439>now its 160k LOC on top of apache nutch 1.4. If i worked with these guys, > it would be never done because it took them 4 months to get 200 lines patch > reviewed. > > review-then-commit is the same rule even if you are a committer. It's not like you can suddenly put changes in without having gone through the JIRA circuit. I also tried to explain why the project is so rigorous: the value of Hadoop is the data stored in HDFS. Imagine someone could put some minor bit of tuning in there that speeded up their cluster slightly, but increased the risk of data loss. Or something to the MR layer that introduced enough of a performance overhead that someone like facebook would have to buy an extra rack of machines. That's why there's a review process. Try getting a patch into ext4 or the linux kernel scheduler and see if its any easier. > Hadoop has huge backlog of patches, you need way more committers then you > have today. I simply could not assign person to working on hadoop fulltime > because if he submits mere 5 patches per day, you will be never able to > process them. > > The bottleneck is not #of committers, it is #of people who understand hadoop well enough to be able to provide adequate reviews -and who have the time to review patches thoroughly -especially the big ones. I think that is a real problem. > Your current development process fail to scale. What are your plans for > moving development faster? > I don't disagree -again, in my slides I tried to make some proposals. 1. even if the source stays in SVN, we could use git-style work of pull requests and gerrit/github code reviewing 2. better distributed development events, where a group of people can go online via a google+ hangout and work together on a specific problem in real-time. 3. more rigorous "review sundays" or similar -where we go through the review queue on a free weekend day and see what can be done about them. 4. Some kind of mentorship process to work with people on larger projects. Again, time is the constraint here. If you've got some other ideas, it'd be good to know them. |
| Powered by Nabble | Edit this page |
