How to get involved in a big open source project?

x2bool2012-06-27 13:28:41

open source

x2bool, 2012-06-27 13:28:41

So so. I'm watching "Yandex.Kit" the other day (this is a recording of lectures conducted by Yandex employees). There, the teacher talks about Linux: he shows something in the console, and at some point: he takes the kernel sources, digs for a short time. and, after a while, exclaimed: “Ahaaaaaaaa. Here. That's what's wrong here!"
The credibility of the above story is questionable, since I was in ~~a coma~~ after that, with a slight shock, but the point is that the teacher knew exactly what and where to look for. The impression is that the kernel sources do not contain white spots for it. At all. I'm sure he understands well how the OS works.
Inspired, I got on the Internet, downloaded MySQL / PostgreSQL for myself, looked at the “Developers” tabs on offsite sites. But I still don't even know which way to approach the source code. These are quite large projects, they all have diagrams, special sections for developers ... However, this did not make me feel any easier.
I'm not talking now about such a level as employees of Yandex, Google and other Microsoft. It is clear that all this is achieved by years of theory, constant practice. But how to approach these sources? Just read the code? How to understand the general principles, and at least the structure? How to start?

Answer the question

In order to leave comments, you need to log in

6 answer(s)

sev, 2012-06-27
@x2bool

I will tell you how I start working with newcomers in our project .
The first task is to pull the latest version from the code repository and compile on the target platform, installing all dependencies along the way.
Then, depending on the interests of the person, I set the task of either fixing some existing bug, or developing a small functionality from the TODO list.
A beginner is expected to come to IRC or knock on Skype, and I suggest where and where to look, what code to edit.
Without such coaching, the process would be extremely slow, since there are a lot of lines of code, despite being structured and logical.
Many years of experience in Google Summer of Code shows that in almost all large projects there are people who are ready to engage in mentoring, provided that the newcomer will work himself, read the development docks himself and ask questions that are not in the documentation or are non-trivial. There are many channels available for communication, from IRC to -devel mailing lists.

@resurtm, 2012-06-27
_

To begin with, it is worth understanding all the possibilities, features and "back streets" of the project at the user level through the documentation and, possibly, source codes. This is the necessary base.
Then start setting yourself (or getting from your place of work) some not quite ordinary tasks or tasks that can be solved more gracefully if you refine the code of the product being used. Write the first patch and try to send it to the upstream (the maintainers will tell you about the jambs and shortcomings in the patches you suggested and help).
It’s a little more difficult to try to fix existing bugs - more perseverance is needed here, but the main maintainers often treat such fixes much better than small functionality additions.
Get into the habit of constantly delving into the source code of the project a couple of levels deeper than the stable API.
(Possibly Cap's comment. Personal opinion of the person who submitted some modest patches to Yii.)

egorinsk, 2012-06-27
@egorinsk

I advise for learning Chrome/Chromium. Why? Because it has a well-organized code and has design documents - a description of the architecture and arrangement of individual components. see here: www.chromium.org/developers/design-documents/ . Also, there are a lot of technical tricks involved.
If, for example, you're interested in learning how compositing (and hardware accelerated rendering) works, you just read the related doc and see the classes mentioned there.
If you are not interested in any part of the product at all, but in the general device - look for the main application file and the main () function and go ... a few days spent studying function call paths will help you get an idea of \u200b\u200bhow the software works.
Naturally, in the course of parsing the code, you may need manuals, for example, look at MSDN for details of WinAPI functions, documentation of external libraries, perhaps something else. If it's C++ (and Chrome uses it), it's also good for you to strengthen your knowledge of this language by scrolling through Stroustrup and reading the C++ FAQ (I don't remember the links, google it yourself, but any self-respecting C++ developer should know that , which is written there, since C++ is extremely rich in opportunities to shoot yourself in the foot).
All manuals and details are only in English, if you don’t know it, then you just have to read the code itself and try to guess. How does it work and what is it responsible for?
Ok, let's say you're not interested in Chrome, you want to get into the Linux kernel. Again, google disdocs and everything that looks like them, for example, in many projects this is called Hacker's manual / hackers reference - it usually describes the general structure of the project and which module is responsible for what.
In short, the Linux kernel consists of separate subsystems, each of which manages a set of tables and lists (process table, list of open files, list of memory pages). The easiest way to learn it is to take a system call like fopen() or kill() or fork() and see what code gets called as it executes. Usually these are checks of user rights, checking the status of the process, calling hooks, and, finally, the most important thing is modifying some tables, for example, adding a signal to the process queue.
Naturally, it is assumed that the reader knows the details of the C language such as pointers, structures, unions, lists, hash tables - without this, it is hardly possible to understand something in the kernel code.
But as in the case of chrome, after spending some time studying the code and documentation, an understanding of its structure will come by itself.
Here, for example, immediately googled: tldp.org/LDP/tlk/tlk.html (old document), www.kernel.org/doc/htmldocs/kernel-hacking.html (not very old). Also, in the kernel source tree there is a folder with a self-explanatory name documentation, readme files are also scattered there.

Evgeny Yablokov, 2012-06-27
@Gular

There is no need to get involved in such a project if it is a start. Because there the level and policies can be developed. In such projects, firstly, it is worth participating if you feel self-confidence, both technical and psychological skills. It is better to choose a more or less average, and start with it.
In reading the code of the project, it is not particularly difficult, usually. Just to repeat - large projects have their own policies, coding styles, file name styles, up to classes and functions. In part, reading such code “at a glance” is not only confusing - you need to know / study the style of the project. As an example of a _large_ project - Samba.

Yaroslav, 2012-07-01
@xenon

For example, the Linux kernel itself. There is a lot of code. And even roughly understanding all the basic principles, how they are implemented inside (and how filtering occurs in netfilter and how a file is created in fs) is not an easy task. But it is divided into monotonous pieces. For example, network card drivers - hundreds of cards are supported, all drivers are very similar to each other. I read (or even wrote) thoughtfully one driver and more or less understood them all, and got a general idea of \u200b\u200bwhat works higher in the kernel there. There are even tutorials on how to write LKM and some real and pseudo drivers - a great way to get started. Similarly with netfilter targets, with file systems, queue discipline, etc. You start just from a narrow piece, look for tutorials (if there are any, not always), read the source codes of similar modules. In the end, you can make one small piece,

kogemrka, 2012-07-08
@kogemrka

Speaking of the Linux kernel - there is a book by Bovet and Cesati "The Linux Kernel" (in the English version - Understanding The Linux Kernel). IMHO - a good introduction to the overall architecture, the main ideas and "where everything lies."
But, of course, in the case of the linux kernel, it is simply vital to just learn how to program under linux, deal with system calls, IPC mechanisms and other things before getting into the kernel itself.