Freitag, 3. September 2010

experimental freebase integration in tagfs

Today I made a little experiment using tagfs and freebase. Freebase is a big database with structured data which contains around 12 million entities [1]. These entities are used to describe subjects from the real world. Like movies, persons, places, ... So freebase is perfectly applicable for enriching your own data.

I will show this by an example. I've got some movies on my hard drive which I enjoy watching. One of them is the movie Ink. I store these movies in a movie directory. Every movie gets it's own subdirectory below the movies directory. So the movie Ink is stored on my hard drive below /movies/Ink/Ink.avi. This enables me to tag my movies for tagfs. My tag file for Ink looks like this:

My Personal Rating: 8 stars
_freebase: type: /film/film, name: Ink, initial_release_date: None

In my experiment I extended tagfs to fetch taggings from freebase. tagfs reads the _freebase value for each item / movie and executes it as a query to freebase. The query delivers a result like this:

  "code": "/api/status/ok",
  "result": {
    "initial_release_date": [
    "name": "Ink",
    "type": "/film/film"
  "status": "200 OK",
  "transaction_id": "cache;cache03.p01.sjc1:8101;2010-09-03T12:38:45Z;0028"

tagfs applies the fields initial_release_date, name and type as further taggings. So exporting the tagfs meta data as CSV looks like this:


Now I can filter my movies with tagfs using the meta data from the huge freebase community. I've release the experimental freebase integration in tagfs below the freebase branch.


Montag, 16. August 2010

using a file system for my bank account

the goal - organize your shared payments

I'm sharing a flat with my girl friend. So we regularly shop things, that we are equally paying. Sometimes it's a little hard to track the many shared payments. Especially if you use cash, your giro account and visa.

the idea - use a directory

To track all your payments you need to organize all payments in one place. Normally this can't be your giro account as your bank doesen't care about your cash payments. So my approach is to gather my payment data on my computer. Fortunately my bank supports a CVS export of my giro and visa transactions. So I can easily export the account data in a computer processable format.

the database is a directory

As database for my payments I use a plain old directory on my file system. The directory contains a subdirectory for every transaction I made. No matter whether it was on my bank accounts or cash. For the transaction directory names I use the simple syntax: <date> - <description>

Examples for the transaction directory names would be:

  • 2010-08-12 - ice cream in park
  • 2010-08-14 - VISA PAYMENT at

Each of the transaction directories is tagged with the transaction details. The taggings are applied in a text file within the transaction directory. The tagging file 2010-08-14 - GIRO PAYMENT at MY GROCERIES SHOP/.tag i.e. has the following content:

date: 2010-08-14
account: visa
amount EUR: 120.00
description: old table

The tag files for the giro and visa transactions are created by a python script I wrote. The python script creates a transaction directory for every row in my bank's CSV exports. The columns from the CSV files are applied as taggings in the tag file. The python script does a little more magic, like merging already existing entries. But that's another story.
The transaction directories for cash payments are created manually by me. But the taggings contain basically the same data. The only difference is that the account field is tagged with cash.

I introduce a new tagging to separate my payments from the payments I share with my girl friend. My tagging for shared payments is:

share: true

export filtered CSV with tagfs

Now that I've collected my transactions and added meta data like the 'share' tagging, I need to filter the transactions. I use tagfs to filter the various transactions, contexts (like 'data', 'account', ...) and taggings. tagfs is mounted as a virtual file system beside the transactions directory. Mounting the above example with tagfs will show me a directory like this:

The tagfs root directory contains various subdirectories. The subdirectories represent filters for the tagfs items aka. my transaction directories. Filtering transactions takes place by entering directories. Enter the share/true/ directory to see all transactions with a share flag. share/true/account/giro will show you all shared transactions which occurred via your giro account.

Now I create a CSV export which contains all shared transactions. To do so I open the CSV file share/true/.export/export.csv. The CSV file contains all matching transactions as rows. The columns represent the different taggings:

name date account amount EUR description share
2010-08-12 - ice cream in park 2010-08-12 cash 1.00   true
2010-08-14 - GIRO PAYMENT at MY GROCERIES SHOP 2010-08-14 giro 32.53 just food true

I open this CSV table in and calculate the sum in the 'amount EUR' column. That's real magic... isn't it?!?

adding multi dimensional spice

OK... I truly admit... the magic hasen't happend yet. What I just did was just some filter with a sum calculation. I could have used a simple excel sheet for that. Excel is a fine tool as long as you use structured data. Our transactions are structured data. Every transactions consists of a limited amount of fields with well defined values. To leave this limited view of the world you have to think of subjects instead of transactions in your bank account. A subject can be anything! A transactions can be a subject as good as a directory with my holiday pictures can be a subject (I borrowed this very abstract view of subjects from the resource description framework and the tripplestore concept). Now I can tag my transactions and my holiday pictures with holiday: india 2009. This allows me various filters:

  • holiday/india 2009 shows everything related to my india vacation in 2009. No matter whether it's a transaction on my visa account or my holiday pictures.
  • holiday/india 2009/account/visa/.export/export.csv lets me calculate all my visa expenses during the holiday.

(my) conclusion

As I think, an excel sheet or a relational database system gives you one view to your data. Viewing your data in a table like structure is good for analyzing items. These items need to be comparable in a specific way. But storing data is different to viewing data. When storing data you need the flexibility to adjust your storage to new kind of entries in your database. Relational database systems do this via tables. Storing data in tables will higher the risk for many conversions and complex table joins.

Mittwoch, 21. Juli 2010

android API source JAR


A few days ago I installed the android SDK and the ADT plugin for eclipse. When I started playing with the android API I was a little disappointed. I was missing the source attachment for the android.jar which contains the android API. I was searching the web for the android sources jar and found a guide how to create an android sources jar [1].

I've adapted the sources howto and created my own android sources jar for android platform 7 aka android 2.1. You can download it here:

I'm pretty sure that the android-src.jar is missing some sources, because the source jar is smaller than the binary jar. Comment me if you find the missing sources.



Freitag, 9. Juli 2010

implementing a new eclipse remote control command


eclipse remote control is an eclipse plugin which allows to execute remote commands within eclipse. Right now it's pretty limited to a very few number of commands. Currently you can open a file and launch a build command. Launching commands is done via the java client application.

Implement a command

To create a new command you have to implement a few classes within eclipse remote control. So you have to check out the eclipse remote control source from the github repository: git://

Implementing a new command requires the following steps:

  1. Implement a communication class. This class contains the data which is sent from the eclipse remote control client to the eclipse remote control plugin in the eclipse IDE.
  2. Extend the eclipse remote control. You need to parse the client's command line arguments and create an instance of your communication class.
  3. Implement a command runner. The command runner contains the actual work which is performed when the command is executed within eclipse.

Implement communication class

The communication classes are located in the com.github.marook.eclipse_remote_control.command project. Add a new java class to the com.github.marook.eclipse_remote_control.command.command package. Your new command class must implement the abstract Command java class from the same package.

The communication class must set a unique ID. This ID is used to identifiy commands in the eclipse remote control plugin. The unique ID is passed to the Command constructor.

The communication class contains all the information which is sent from the eclipse remote control client to the eclipse remote control plugin. So the communication class needs to contain fields for all transfered information. Also you have to add getter and setter methods for all the fields.

All communication classes implement the Serializable interface. Make sure your command class and the command class's fields implement the Serializeable requirements.

Extend eclipse remote control client

The client is implemented in the com.github.marook.eclipse_remote_control.client project. The client creates command classes from command line arguments and sends it to the eclipse remote control plugin. To create and send your command class you have to add the parse and send code to the com.github.marook.eclipse_remote_control.client.Client class's main method. The following listing is an example of the parse and send code for the open file command:

 if(args.length < 2){
 final OpenFileCommand cmd = new OpenFileCommand();

Implement command runner

Here comes the actual work. You have to implement a command runner which executes the command within the eclipse instance. All command runners are implemented within the project. Create a new command runner class in the package. All command runnerst must implement the ICommandRunner interface from the same project. For your convenience you should use the AbstractAtomCommandRunner superclass for your command.

The commands's work is implemented in the command runner's internalExecute(...) method. This method is specified by the ICommandRunner interface.

At last you must register the command runner in the SimpleCommandRunner class. Add a putAtomRunner method call to the static block in the SimpleCommandRunner class. Right now this static block contains only two registrations:

static {
 putAtomRunner(new OpenFileCommandRunner());
 putAtomRunner(new ExternalToolsCommandRunner());

Basically that's all you need to do for a new command. If you need more information check out the eclipse remote control source code. Read the source from the existing commands. I think this will be the best for getting started.