Libre Software People's Front

don't confuse it with People's Front of Open Source

Posts Tagged ‘libresoft

Who will be the libre software developers by 2020?

leave a comment »

As part of my master’s homework, I’ve just read an article written by some remarkable colleagues a few of years ago about the geographic origin of libre software developers. The article was interesting and had some impact. The key question that they tried to answer was how diverse the national origin of developers is, and the approach was also new as they didn’t want to use surveys but real data. They information that they used in the study was the following:

  • A dump of the SourceForge database created in 2005, which included more than 1,180,000 registered users.
  • Mailing lists archives of the Debian, GNOME and FreeBSD projects. A total of more than one million different e-mail addresses.

The article was focused both on users/contributors (contributors of the forge, mailing lists, source code repositories) and developers (contributors of the source code repos). Obviously, the second group is a subset of the first one. These were some of the results:

  • out of 1.1 million registered participants on SourceForge, just under 50,000 committed code to the development repositories. Well this is not a result of the study, but I found it quite interesting
  • most of the total users came from Europe and North America, followed by Asia with less than 10% of the developer population. If we take into account that the population is larger in Europe, the penetration of the libre software development per capita was higher in North America than in Europe.
  • there were more developers in the US and Canada than in most European countries or regions. On the other hand, the US had fewer libre software developers per million Internet users than most European countries.
  • when the total number of developers is adjusted using wealth (GSD), China, India, Russia, Brazil and even South Africa are among the higher contributors.

I wonder if the situation will be same by 2020.


“Geographic origin of libre software developers”
by Jesus M. Gonzalez-Barahona, Gregorio Robles, Roberto Andradas-Izquierdo and Rishab Aiyer Ghosh

Written by sanacl

December 28, 2011 at 1:18 pm

80 columns limit in the XXI century

leave a comment »

A couple of weeks ago some colleagues and I were discussing whether to follow the Style Guide for Python Code (PEP8) and its limitation of 80 columns. The origin of that limit is much older than I thought. It comes from the IBM punch cards!.

At first sight it’s nonsense to obly people to write code in 80 columns when most of the screen display resolutions are above 1024×768. On the other hand, it is better to find “standards” to be followed in your collaborative projects.

According to PEP8 these are the main reasons for using the old limitation:

  • there are still many devices around that are limited to 80 character lines
  • limiting windows to 80 characters makes it possible to have several windows side-by-side
  • it is easier to be read

I agree with the second point, but first and third point are at least debatable. What is sure is that the 80 columns limitation will be updated sooner or later. The second argument will be useless when most of the screens displays will be above 1280×800 pixels. The only question is when the 80-column will be part of the past.

Some big projects like webkit also faced the problem of choosing whether following the entire PEP8 or throwing the 80-col out the window. The discussion is available here.

So .. finally we decided to use the PEP8 as far as possible. The new contributions made to bicho and cvsanaly follow it and the legacy code should be updated during the following months.

Written by sanacl

December 17, 2011 at 9:49 pm

Posted in Uncategorized

Tagged with , , ,

My last contribution to the OSOR platform

leave a comment »

As you may know, OSOR and SEMIC will merge very soon to become Joinup. I have been working in OSOR since the very beginning, and today I helped Roberto Andradas to provide the last dump with the data that will be used by the new staff to feed Joinup.

More than three years ago, Juanjo Amor and Jesús M. González-Barahona offered me the challenge of coordinating a very young and passionate team to deploy, maintain and improve the platform. We made many mistakes but I am quite happy to see how we have learned. During the last year my contribution to OSOR was very small because Roberto Andradas took over the coordination, but I must confess that today I have a strange feeling (mixture of hapiness and sadness).

I would like to thank all the people that worked within the project during these years. It has been a pleasure. The OSOR community will migrate now to the Joinup platform with a different staff. Game is over for us, new adventures ahead.


Written by sanacl

December 2, 2011 at 6:56 pm

Posted in Uncategorized

Tagged with , ,

Links collection about software forges: status, criticism and new ideas

with 4 comments

During the last two years it is quite common to hear about new software forges, but I’m not going to talk about forges proliferation in this post, what I would to like to discuss it what the next step in collaborative development environments is. Thus in order to get the big picture I spent some hours looking for scientific and “informal” publications, now I think I have a good starting point and it would be great if you can offer feedback or even improve it.

The list of links contains papers, blog entries, presentations and reports. Some of them point out very interesting problems like the “data jail”, others re-think the concept of software forges like the paper named “The networked forge” and finally, some of the developers of the main software forges present what they do and what they want to do during next years.


Blog entries:



I’ll submit this information to the planetforge wiki, as soon as it is posted there I will include here the link.

[This entry is part of the work I do in LibreSoft and it is also available in my blog at]

Written by sanacl

August 24, 2011 at 7:40 pm

Posted in Uncategorized

Tagged with , ,

KESI: our first component for the ALERT project

leave a comment »

As part of the work of URJC (LibreSoft) in the ALERT project, whose aim is to increase the efficiency of the developers in libre software projects, we are about to present the first iteration of the Knowledge Extractor for Structured Information, aka KESI. This component with a very complicated name has a simple mission, that is to gather information from source code repositories and from issue/bug tracking systems and to send it to the rest of the components of the ALERT platform.

The complexity of this component is the different structure of the information offered by the resources. Every code repository and tracker uses its own format to export information, then in order to include a new type of repo/tracker in the KESI it is necessary to create a new parser. As an example, a big percent of the information extracted from a remote issue tracker system needs to be parsed from HTML text, which is not the ideal format.

Once the information has been gathered by the KESI, it transforms it into a format known by the rest of the components of the platform and then publish it using a dedicated bus.

These are the main KESI features:

  • supports CVS, Subversion and GIT
  • supports Jira, Sourceforge and Bugzilla
  • it performs an incremental analysis
  • it publish the changes detected (events) to the rest of the platform

The information obtained by the KESI is critical in the following scenarios:

  • recommend a developer which bug to solve
  • detect duplicated bugs
  • let the developer know about buggy parts of the code
  • identifying inactive developers and orphaned parts of the code

We are looking forward to see it live with the rest of the platform!

[This entry is part of the work I do in LibreSoft and it is also available in my blog at]

Written by sanacl

August 8, 2011 at 5:01 pm

Posted in Uncategorized

Tagged with ,

Bicho 0.9 is comming soon!

leave a comment »

During last months we’ve been working to improve Bicho, one of our data mining tools. Bicho gets information from remote bug/issue tracking systems and store them in a relational database.

How Bicho works


The next release of Bicho 0.9 will also include incremental support, which is something we’ve missed for flossmetrics and for standalone studies with a huge amount of bugs. We also expect that more backends will be created easily with the improved backend model created by Santi Dueñas. So far we support JIRA, Bugzilla and Sourceforge. For the first two ones we parse HTML + XML, for sourceforge all we have is HTML so we are more dependent from the layout (to minimize that problem we use BeautifulSoup). We plan to include at least backends for FusionForge and Mantis (which is partially written) during this year.

Bicho is being used currently in the ALERT project (still in the first months) where all the information offered by the bug/issue reports will be related to the information available in the source code repositories (using CVSAnaly) through semantic analysis. That relationship will allow us to help developers through recommendations and other more pro-active use cases. One of my favorites is to recommend a developer to fix a bug through the analysis of the stacktraces posted in a bug. In libre software projects all the information is available in the internet, the main problem (not a trival one) is that it is available in very different resources. Using bicho against the bts/its we can get the part of the code (function name, class and file) that probably contains the error and the version of the application. That information can be related to the one got from the source code repository with cvsanaly, in this case we would need to find out who is the developer that edit that part of the code more often. This and other uses cases are being defined in the ALERT project.

If you want to stay tunned to Bicho have a look at the project page at or the mailing list libresoft-tools-devel _at__

[This entry is part of the work I do in LibreSoft and it is also available in my blog at]

Written by sanacl

June 9, 2011 at 3:00 pm

Finding code clones between two libre software projects

leave a comment »

Last month I’ve been working in the creation of a report with the aim of finding out code clones between two libre software projects. The method we used was basically the one that was detailed in the paper Code siblings: Technical and Legal Implications by German, D., Di Penta M., Gueheneuc Y. and Antoniol, G.

It is an interesting case and I’m pretty sure this kind of reports will be more and more interesting for entities that publish code using a libre software license. Imagine you are part of a big libre software project and your copyright and even money is there, it would be very useful to you knowing whether a project is using your code and respecting your copyright and the rights you gave to the users with the license. With the aim of identifying these scenarios we did in our study the following:

  • extraction of clones with CCFinderX
  • detection of license with Ninka
  • detection of the copyright with shell scripts

The CCFinderX tool used in the first phase gives you information about common parts of the code, it detects a common set of tokens (by default it is 50) between two files, this parameter should be changed depending on what it is being looked for. In the following example the second and third column contain information about the file and the common code. The syntax is (id of the file).(source file tokens) so the example shows that the file with id 1974 contains common code with files with id 11, 13 and 14.

clone_pairs {
19108 11.85-139 1974.70-124
19108 13.156-210 1974.70-124
19108 14.260-314 1974.70-124
12065 17.1239-1306 2033.118-185
12065 17.1239-1306 2033.185-252
12065 17.1239-1306 2033.252-319
12065 17.1239-1306 2141.319-386

In the report we did we only wanted to estimate the percent of code used from the “original” project in the derivative work, but there are some variables that are necessary to take into account. First, code clones can appear among the files of the same project (btw this is clear sign of needing refactorization). Second, different parts of a file can have clones in different files (a 1:n relationship) in both projects. The ideal solution would be to study file by file the relationship with others and to remove the repeated ones.

Once the relationship among files is created is the turn of the license and copyright detection. In this phase the method just compares the output of the two detectors and finally you get a matrix where it is possible to detect whether the copyright holders were respected and the license was correctly used.

Daniel German’s team found interesting things in their study of the FreeBSD and Linux kernels. They found GPL code in FreeBSD in the xfs file system. The trick to distribute this code under a BSD license is to distribute it disabled (is not compiled into FreeBSD) and let the user the election of compiling it or not. If a developer compiles the kernel with xfs support, the resulting kernel must be distributed under the terms of the GPLx licence.

[This entry is part of the work I do in LibreSoft and it is also available in my blog at]

Written by sanacl

May 11, 2011 at 7:49 pm