Libre Software People's Front

don't confuse it with People's Front of Open Source

Posts Tagged ‘mswl-cases

How to get quantitative data from the Android source code (II)

with 2 comments

( have a look at the previous post if you didn’t )

I recommend you to use the screen command to download the repos, it could take a couple of hours if your connection is not quick. Use a log file to ensure that everything was properly downloaded and the mail command to notify you when the downloads finish.

../get_repos.sh > ../log_git_clone.txt 2>&1; mail lcanas@libresoft.es -s "git clone fin" < ../log_git_clone.txt 

After using git clone to get all the git repositories used by Android, we need to start using cvsanaly to analyze the code, again we will use a log file.

list=`ls`
for i in $list
do 
echo "------ ANALYSING $i" >> ../log-cvsanaly.txt
~/repos/cvsanaly/cvsanaly2 -u **** -p **** -d cvsanaly_android_lcanas $i >> ../log-cvsanaly.txt 2>&1
done
mail lcanas@libresoft.es -s "cvsanaly finished" < ../log-cvsanaly.txt

At this point we’ve got a single mysql database with all the information of the 167 Android repositories. The next step is to use this information to answer some questions, in this introductory study we are going to examine the activity over time (in terms of commits) of the project and divided by Google staff and others. We will assume that the Google employees use a user id with @google or @android, that’s how we will divide them in two groups.

The first R commands below create the connection with the mysql database and obtain the variables comm and googlers which contain the number of commits per month and domain.

> library(RMySQL)
Loading required package: DBI
> con <- dbConnect( MySQL(), user="***", password="***", dbname="cvsanaly_android23_lcanas" )
> comm <- dbGetQuery(con, "select count(scmlog.id) as comm from scmlog join people on (scmlog.author_id=people.id) 
where date >= '2008-10-21 00:00:00' and people.email not like '%@android.com%' and people.email not like '%@google.com%' 
group by date_format(scmlog.date, '%Y %m') order by date_format(scmlog.date, '%Y %m') asc;")

> googlers <- dbGetQuery(con, "select count(scmlog.id) as googlers from scmlog join people on (scmlog.author_id=people.id) 
where date >= '2008-10-21 00:00:00' and people.email like '%@android.com%' or people.email like '%@google.com%' 
group by date_format(scmlog.date, '%Y %m') order by date_format(scmlog.date, '%Y %m') asc;")

We join the information from google employees and the rest of contributors. It is also needed to obtain the list of months which will be useful as x axis in the chart we will generate.

> mymatrix2<-cbind(googlers,comm)

> months <- dbGetQuery(con, "select date_format(scmlog.date, '%m/%y') as month from scmlog join people 
on (scmlog.author_id=people.id) where date >= '2008-10-21 00:00:00' and people.email not like '%@android.com%' and 
people.email not like '%@google.com%' group by date_format(scmlog.date, '%Y %m') order by date_format(scmlog.date, '%Y %m') asc;")

The last step is to generate the chart and save it to a file.

> barplot(t(mymatrix2),names.arg=t(months),ylab="commits",legend.text=c("Google employees","Rest"),col=c("dark green","grey"))

> savePlot(filename="android-commits-domains.png", type="png")

Voilà, based on the software history of the Android project we have generated a view of the activity around the code in terms of commits over time.

This basic process should be improved to obtain more accurate results, for instance some of the Google employees committed code using an empty mail address, then the contribution from non google employees seems to be bigger than it is. It will also be necessary to analyze the Linux kernel together with the rest of the Android code in order to obtain a wider view of the effort invested by the Android community. There are many different questions that can shed some light on how the different communities work, in the last two posts we’ve seen one of the methods to start performing a quantitative study with the purpose of answering some of those questions.

Advertisements

Written by sanacl

December 31, 2010 at 2:07 am

How to get quantitative data from the Android source code (I)

with one comment

One of my targets for 2011 is to make as easy as possible the process of obtaining quantitative data from open source projects. We have developed several tools with that purpose but they still need a lot of love to be really user-friendly and stable. In the following two posts I’ll show you how to get basic data from FLOSS projects using the source code repository, in this example we will study the code provided by Android using cvsanaly to get data from the repositories and R to create a couple of charts.

The Google developers created a tool called repo to deal with the different git repos that they are using in Android. I don’t like to install tools that I won’t use so I’ll bypass it with a couple of bash commands.

The repo command uses the git://android.git.kernel.org/platform/manifest.git as starting point, so after cloning this repository you’ll see that it contains a XML file called default.xml with the following content:

  <project path="system/bluetooth" name="platform/system/bluetooth" />
  <project path="system/core" name="platform/system/core" />
  <project path="system/extras" name="platform/system/extras" />
  <project path="system/netd" name="platform/system/netd" />
  <project path="system/vold" name="platform/system/vold" />
  <project path="system/wlan/ti" name="platform/system/wlan/ti" />

The XML code above only shows some of the 159 references to git repositories. Without the repo command created by Google, the developers should have to download them one by one or using a script. We will use awk and a simple bash script to extract them form the XML file and download them in one go.

$ list=`cat default.xml |awk -F '"' '{print $4}'|grep -v '^$'|grep -v "UTF-8"|grep -v "Makefile$"`
$ for i in $list
do 
j=`echo $i|sed 's:/:_:g'`
echo git clone git://android.git.kernel.org/$i $j >> get_repos.sh
done

Now, just edit the file get_repos.sh and add the following lines at the beginning and we have a script to download the Android’s repositories. Don’t forget to give it execution permission.

#!/bin/bash
echo "getting android repos"

Easy, isn’t it?. The next step is to execute the script to download the 159 git repositories and in the meanwhile install cvsanaly which has to be installed from sources, but do not panic it is straightforward:

At this point you are ready to start playing with the raw data extracted from all the git repositories in a single relational database. Stay tuned, the second chapter is coming soon.

UPDATE: the new release of Android 2.3 which has been published a couples of days ago uses 167 git repositories

Read the second part

Written by sanacl

December 17, 2010 at 8:52 am

Tools that use git in ways that were never intended

with one comment

As part of my homework for the master on libre software I have to study some of the tools that are commonly used in FLOSS. Git is one of my favourite ones and today I found a couple of tools that use git for other purposes than managing source code. These are git-annex, etckeeper, mr and ikiwiki:

git-annex is a very interesting application to manage large files with git but without checking the file contents into git. The creator explain it in two sample cases, the Archivits and the Nomad, both detailed in the web site at git-annex.branchable.com. The basic idea of this tool is you can have a lot of information stored in several drives and you can forget to keep them manually in sync, you can even build a simple backup method requiring more than one copy of a file exists or maybe you just need to move file content between repositories because your laptop is running out of space and you still want to have the data as available as possible.

etckeeper is the kind of application that you miss when you are in trouble, it works with APT or YUM to let /etc be stored in git, mercurial, bzr or darcs repository. How many times have you been looking for an old version of a file related with your X windows or virtual hosts of apache? Great tool for system administrators.

The third tool is mr, a command to perform actions over a set of repositories as if they were one combined repository. This is very useful, with the arrival of git the big projects end up with hundreds of repositories which makes more complex some trivial actions like updating, for instance have a look at the repos used by the Android project at android.git.kernel.org, Google had to create an ad-hoc script to check them out in one go.

Last but not least let me present ikiwiki, a wiki compiler based on git. Besides using git to store pages and its history, it also converts wiki pages into HTML pages suitable for publishing on a website, including support for blogging, as well as a large array of plugins. Currently the site www.branchable.com offers ikiwiki “to create a website, wiki, or blog, easily, within a minute”.

If you know more tools that use git as backend please let me know.

Written by sanacl

December 14, 2010 at 8:14 pm

SCMs market share

leave a comment »

Two weeks ago we studied the Apache market share in the master on libre software, this libre product that has dominated the niche of the web servers during the last decade, it isn’t bad at all!. As part of my homework and as I usually work with development tools I’ve get some data about the current SCM market share. Due to I work only with libre software I thought I could have a biased vision, so I’ve looked for reports done about this niche market but I didn’t found a fresh one.

A Forrester’s study made a year ago stated that Open Source SCM Solutions were closing in on 50% total market share. Have a look at the bar chart below:

As you can see, modern and libre SCMs like Git and Mercurial didn’t have remarkable position more than a year ago and Subversion had more than double users than the second one. Have a look again at the chart, there is something unbelievable on the bottom of the list. Did you see it? 6,5% of those 1020 persons did not use a source code management system. More than 50 “application development professionals” do not use one of these tools, I would even replace the article’s title with “6 out of 100 application developers have discovered something better than source code management systems”. Or .. maybe they haven’t 😉

Let me recover the focus, I think that the next report will be very different for Git. During the last year the outstanding growth of platforms based on it like github (with half a million git repositories created in half a year!) and gitorious should have impact on the market share. We could end up with Subversion leaving the first position in favour of Git, at least that is what we could think having a look at the google trends for these two applications. On the other hand, the Debian users still value Subversion over Git.

Written by sanacl

November 28, 2010 at 8:54 pm

Neither vim nor emacs, nano!

with one comment

Since I’m on the bright side working with libre software I’ve read a lot of discussions about using vim or emacs, there is even a coined term for this, that is Editor war. Maybe this discussion is out of fashion because according to popcon, which is the Debian popularity contest, there is a clear winner and it is none of them.

Nano wins the popularity battle, we’ll see if it wins the war ..

UPDATE: According to the votes, nano is still winning

Written by sanacl

November 26, 2010 at 7:58 pm

Posted in Uncategorized

Tagged with , , , , ,