Quantcast

commit access to hadoop

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

commit access to hadoop

Radim Kolar
what it takes to gain commit access to hadoop?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: commit access to hadoop

Steve Loughran-3
On 21 November 2012 15:03, Radim Kolar <[hidden email]> wrote:

> what it takes to gain commit access to hadoop?
>


good question.

I've put some of my thoughts on the topic into a presentation I gave last
month:
http://www.slideshare.net/steve_l/inside-hadoopdev

That isn't so much about commit/non-commit status, because it was more
focused about getting your code in -which is normally what matters.

Even committers have to go through RTC -you don't get any special
privileges from commit rights, the task of keeping patches up to date and
having to remind others to review it. The main "feature" is that when you
get the +1  vote you yourself get to deal with the grunge work of apply
patches to one or more svn branches, resyncing that with the git branches
you inevitably do your own work on.

And it also gives you more responsibilty to review and commit others'
works, which is something that some of (I point to myself here) are lax at.
I can't help wondering if we need to be a bit more formal about that too,
have one day a month "review sundays"(*) where we do go through and review
the outstanding works, so they don't fall by the wayside.

-steve

(*) Yes, a sunday. I know everyone  is too busy to dedicate a weekday to
this.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: commit access to hadoop

Radim Kolar

> The main "feature" is that when you get the +1  vote you yourself get to deal with the grunge work of apply
> patches to one or more svn branches, resyncing that with the git branches
> you inevitably do your own work on.
no, main feature is major speed advantage. It takes forever to get
something committed. I was annoyed with apache nutch last year and
forked it, here is snapshot from forked codebase
http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439 
now its 160k LOC on top of apache nutch 1.4. If i worked with these
guys, it would be never done because it took them 4 months to get 200
lines patch reviewed.

Hadoop has huge backlog of patches, you need way more committers then
you have today. I simply could not assign person to working on hadoop
fulltime because if he submits mere 5 patches per day, you will be never
able to process them.

Your current development process fail to scale. What are your plans for
moving development faster?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: commit access to hadoop

Steve Loughran-3
On 26 November 2012 21:25, Radim Kolar <[hidden email]> wrote:

>
>  The main "feature" is that when you get the +1  vote you yourself get to
>> deal with the grunge work of apply
>> patches to one or more svn branches, resyncing that with the git branches
>> you inevitably do your own work on.
>>
> no, main feature is major speed advantage. It takes forever to get
> something committed. I was annoyed with apache nutch last year and forked
> it, here is snapshot from forked codebase http://forum.lupa.cz/index.**
> php?action=dlattach;topic=**1674.0;attach=3439<http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439>now its 160k LOC on top of apache nutch 1.4. If i worked with these guys,
> it would be never done because it took them 4 months to get 200 lines patch
> reviewed.
>
>
I'm sorry you missed the bit in my slides where I emphasised that
review-then-commit is the same rule even if you are a committer. It's not
like you can suddenly put changes in without having gone through the JIRA
circuit. I also tried to explain why the project is so rigorous:

the value of Hadoop is the data stored in HDFS.

Imagine someone could put some minor bit of tuning in there that speeded up
their cluster slightly, but increased the risk of data loss. Or something
to the MR layer that introduced enough of a performance overhead that
someone like facebook would have to buy an extra rack of machines.  That's
why there's a review process. Try getting a patch into ext4 or the linux
kernel scheduler and see if its any easier.



> Hadoop has huge backlog of patches, you need way more committers then you
> have today. I simply could not assign person to working on hadoop fulltime
> because if he submits mere 5 patches per day, you will be never able to
> process them.
>
>
The bottleneck is not #of committers, it is #of people who understand
hadoop well enough to be able to provide adequate reviews -and who have the
time to review patches thoroughly -especially the big ones. I think that is
a real problem.


> Your current development process fail to scale. What are your plans for
> moving development faster?
>

I don't disagree -again, in my slides I tried to make some proposals.


   1. even if the source stays in SVN, we could use git-style work of pull
   requests and gerrit/github code reviewing
   2. better distributed development events, where a group of people can go
   online via a google+ hangout and work together on a specific problem in
   real-time.
   3. more rigorous "review sundays" or similar -where we go through the
   review queue on a free weekend day and see what can be done about them.
   4. Some kind of mentorship process to work with people on larger
   projects. Again, time is the constraint here.

If you've got some other ideas, it'd be good to know them.
Loading...