Libre Software People's Front

don't confuse it with People's Front of Open Source

Posts Tagged ‘mswl-comm

Study of the Android development activity and its authors

leave a comment »

Libre software is changing the way applications are built by companies, while the traditional software development model does not pay attention to external contributions, libre software products developed by companies benefit from them. These external contributions are promoted creating communities around the project and will help the company to create a superior product with a lower cost than possible for traditional competitors. The company in exchange offers the product free to use under a libre software license.

Android is one of these products, it was created by Google a couple of years ago and it follows a single vendor strategy. As Dirk Riehle introduced some time ago it is a kind of a economic paradox that a company can earn money making its product available for free as open source. But companies are not NGOs, they don’t give away money without expecting something in return, so where is the trick?

As a libre software project Android did not start from scratch, it uses software that would be unavailable for non-libre projects. Besides that, it has a community of external stakeholders that improve and test the latest version published, help to create new features and fix errors. It is true that Android is not a project driven by a community but driven by a single vendor, and Google does it in a very restricted way. For instance external developers have to sign a Grant of Copyright License and they do not even have a roadmap, Google publish the code after every release so there are big intervals of time where external developers do not have access to the latest code. Even with these barriers there are a significant part of the code that is being provided from external people, it is done directly for the project or reused from common dependencies (GIT provides ways to reuse changes done to remote repositories).

Commits by domain per month (proportional)

Commits by domain per month (proportional)

Commits by domain per month (total)

Commits by domain per month (total)

The figures above reflect the monthly number of commits done by people split up in two, in green colour commits from mail domains or, the study assumes that these persons are Google employees. On the other hand in grey colour the rest of commits done by other mail domains, these ones belong to different companies or volunteers.

According to the first figure (on the left), which shows the proportion of commits, during the first months that were very active (March and April 2009) the number of commits from external contributors was similar to the commits done by Google staff. The number of external commits is also big in October 2009, when the total amount of commits reached its maximum. Since April 2009 the monthly activity of the external contributors seems to be between 10% and 15%.

The figure on the left provides a interesting view of the total activity per month, two very interesting facts here: the highest peak of development was reached during late 2009 (more than 8K commits per month during two months). The second is the activity during the last months, as it was mentioned before the Google staff work in private repositories so until they publish the next version of Android, we won’t see another peak of development (take into account that commits in GIT will modify the history when the code is published, thus the last months in the timeline will be overwritten during the next release)

Commits by domain

Commits by domain

More than 10% of the commits used by Google in Android were committed using mail domains different to or At this point the question is: who did it?

(Since October 2008)

# Commits Domain
8815 (NULL)

Having a look at the name of the domains, it is very surprising that Nokia is one of the most active contributors. This is a real paradox, the company that states that Android is its main competition helps it!. One of the effects of using libre software licenses for your work is that even your competition can use your code, currently there are Nokia commits in the following repositories:

  • git://
  • git://

This study is a ongoing process that should become a scientific paper, if you have feedback please let me know.

CVSAnalY was used to get data from 171 GIT repositories (the Linux kernel was not included). Our tool allow us to store the metadata of all the repositories in one SQL database, which helped a lot. The study assumes that people working for Google use a domain or


[This entry is part of the work I do in LibreSoft and it is also available in my blog at]

Written by sanacl

April 16, 2011 at 5:29 pm

How to get quantitative data from the Android source code (II)

with 2 comments

( have a look at the previous post if you didn’t )

I recommend you to use the screen command to download the repos, it could take a couple of hours if your connection is not quick. Use a log file to ensure that everything was properly downloaded and the mail command to notify you when the downloads finish.

../ > ../log_git_clone.txt 2>&1; mail -s "git clone fin" < ../log_git_clone.txt 

After using git clone to get all the git repositories used by Android, we need to start using cvsanaly to analyze the code, again we will use a log file.

for i in $list
echo "------ ANALYSING $i" >> ../log-cvsanaly.txt
~/repos/cvsanaly/cvsanaly2 -u **** -p **** -d cvsanaly_android_lcanas $i >> ../log-cvsanaly.txt 2>&1
mail -s "cvsanaly finished" < ../log-cvsanaly.txt

At this point we’ve got a single mysql database with all the information of the 167 Android repositories. The next step is to use this information to answer some questions, in this introductory study we are going to examine the activity over time (in terms of commits) of the project and divided by Google staff and others. We will assume that the Google employees use a user id with @google or @android, that’s how we will divide them in two groups.

The first R commands below create the connection with the mysql database and obtain the variables comm and googlers which contain the number of commits per month and domain.

> library(RMySQL)
Loading required package: DBI
> con <- dbConnect( MySQL(), user="***", password="***", dbname="cvsanaly_android23_lcanas" )
> comm <- dbGetQuery(con, "select count( as comm from scmlog join people on ( 
where date >= '2008-10-21 00:00:00' and not like '' and not like '' 
group by date_format(, '%Y %m') order by date_format(, '%Y %m') asc;")

> googlers <- dbGetQuery(con, "select count( as googlers from scmlog join people on ( 
where date >= '2008-10-21 00:00:00' and like '' or like '' 
group by date_format(, '%Y %m') order by date_format(, '%Y %m') asc;")

We join the information from google employees and the rest of contributors. It is also needed to obtain the list of months which will be useful as x axis in the chart we will generate.

> mymatrix2<-cbind(googlers,comm)

> months <- dbGetQuery(con, "select date_format(, '%m/%y') as month from scmlog join people 
on ( where date >= '2008-10-21 00:00:00' and not like '' and not like '' group by date_format(, '%Y %m') order by date_format(, '%Y %m') asc;")

The last step is to generate the chart and save it to a file.

> barplot(t(mymatrix2),names.arg=t(months),ylab="commits",legend.text=c("Google employees","Rest"),col=c("dark green","grey"))

> savePlot(filename="android-commits-domains.png", type="png")

Voilà, based on the software history of the Android project we have generated a view of the activity around the code in terms of commits over time.

This basic process should be improved to obtain more accurate results, for instance some of the Google employees committed code using an empty mail address, then the contribution from non google employees seems to be bigger than it is. It will also be necessary to analyze the Linux kernel together with the rest of the Android code in order to obtain a wider view of the effort invested by the Android community. There are many different questions that can shed some light on how the different communities work, in the last two posts we’ve seen one of the methods to start performing a quantitative study with the purpose of answering some of those questions.

Written by sanacl

December 31, 2010 at 2:07 am

How to get quantitative data from the Android source code (I)

with one comment

One of my targets for 2011 is to make as easy as possible the process of obtaining quantitative data from open source projects. We have developed several tools with that purpose but they still need a lot of love to be really user-friendly and stable. In the following two posts I’ll show you how to get basic data from FLOSS projects using the source code repository, in this example we will study the code provided by Android using cvsanaly to get data from the repositories and R to create a couple of charts.

The Google developers created a tool called repo to deal with the different git repos that they are using in Android. I don’t like to install tools that I won’t use so I’ll bypass it with a couple of bash commands.

The repo command uses the git:// as starting point, so after cloning this repository you’ll see that it contains a XML file called default.xml with the following content:

  <project path="system/bluetooth" name="platform/system/bluetooth" />
  <project path="system/core" name="platform/system/core" />
  <project path="system/extras" name="platform/system/extras" />
  <project path="system/netd" name="platform/system/netd" />
  <project path="system/vold" name="platform/system/vold" />
  <project path="system/wlan/ti" name="platform/system/wlan/ti" />

The XML code above only shows some of the 159 references to git repositories. Without the repo command created by Google, the developers should have to download them one by one or using a script. We will use awk and a simple bash script to extract them form the XML file and download them in one go.

$ list=`cat default.xml |awk -F '"' '{print $4}'|grep -v '^$'|grep -v "UTF-8"|grep -v "Makefile$"`
$ for i in $list
j=`echo $i|sed 's:/:_:g'`
echo git clone git://$i $j >>

Now, just edit the file and add the following lines at the beginning and we have a script to download the Android’s repositories. Don’t forget to give it execution permission.

echo "getting android repos"

Easy, isn’t it?. The next step is to execute the script to download the 159 git repositories and in the meanwhile install cvsanaly which has to be installed from sources, but do not panic it is straightforward:

At this point you are ready to start playing with the raw data extracted from all the git repositories in a single relational database. Stay tuned, the second chapter is coming soon.

UPDATE: the new release of Android 2.3 which has been published a couples of days ago uses 167 git repositories

Read the second part

Written by sanacl

December 17, 2010 at 8:52 am