Libre Software People's Front

don't confuse it with People's Front of Open Source

Posts Tagged ‘metrics

How to get quantitative data from the Android source code (II)

with 2 comments

( have a look at the previous post if you didn’t )

I recommend you to use the screen command to download the repos, it could take a couple of hours if your connection is not quick. Use a log file to ensure that everything was properly downloaded and the mail command to notify you when the downloads finish.

../get_repos.sh > ../log_git_clone.txt 2>&1; mail lcanas@libresoft.es -s "git clone fin" < ../log_git_clone.txt 

After using git clone to get all the git repositories used by Android, we need to start using cvsanaly to analyze the code, again we will use a log file.

list=`ls`
for i in $list
do 
echo "------ ANALYSING $i" >> ../log-cvsanaly.txt
~/repos/cvsanaly/cvsanaly2 -u **** -p **** -d cvsanaly_android_lcanas $i >> ../log-cvsanaly.txt 2>&1
done
mail lcanas@libresoft.es -s "cvsanaly finished" < ../log-cvsanaly.txt

At this point we’ve got a single mysql database with all the information of the 167 Android repositories. The next step is to use this information to answer some questions, in this introductory study we are going to examine the activity over time (in terms of commits) of the project and divided by Google staff and others. We will assume that the Google employees use a user id with @google or @android, that’s how we will divide them in two groups.

The first R commands below create the connection with the mysql database and obtain the variables comm and googlers which contain the number of commits per month and domain.

> library(RMySQL)
Loading required package: DBI
> con <- dbConnect( MySQL(), user="***", password="***", dbname="cvsanaly_android23_lcanas" )
> comm <- dbGetQuery(con, "select count(scmlog.id) as comm from scmlog join people on (scmlog.author_id=people.id) 
where date >= '2008-10-21 00:00:00' and people.email not like '%@android.com%' and people.email not like '%@google.com%' 
group by date_format(scmlog.date, '%Y %m') order by date_format(scmlog.date, '%Y %m') asc;")

> googlers <- dbGetQuery(con, "select count(scmlog.id) as googlers from scmlog join people on (scmlog.author_id=people.id) 
where date >= '2008-10-21 00:00:00' and people.email like '%@android.com%' or people.email like '%@google.com%' 
group by date_format(scmlog.date, '%Y %m') order by date_format(scmlog.date, '%Y %m') asc;")

We join the information from google employees and the rest of contributors. It is also needed to obtain the list of months which will be useful as x axis in the chart we will generate.

> mymatrix2<-cbind(googlers,comm)

> months <- dbGetQuery(con, "select date_format(scmlog.date, '%m/%y') as month from scmlog join people 
on (scmlog.author_id=people.id) where date >= '2008-10-21 00:00:00' and people.email not like '%@android.com%' and 
people.email not like '%@google.com%' group by date_format(scmlog.date, '%Y %m') order by date_format(scmlog.date, '%Y %m') asc;")

The last step is to generate the chart and save it to a file.

> barplot(t(mymatrix2),names.arg=t(months),ylab="commits",legend.text=c("Google employees","Rest"),col=c("dark green","grey"))

> savePlot(filename="android-commits-domains.png", type="png")

Voilà, based on the software history of the Android project we have generated a view of the activity around the code in terms of commits over time.

This basic process should be improved to obtain more accurate results, for instance some of the Google employees committed code using an empty mail address, then the contribution from non google employees seems to be bigger than it is. It will also be necessary to analyze the Linux kernel together with the rest of the Android code in order to obtain a wider view of the effort invested by the Android community. There are many different questions that can shed some light on how the different communities work, in the last two posts we’ve seen one of the methods to start performing a quantitative study with the purpose of answering some of those questions.

Written by sanacl

December 31, 2010 at 2:07 am