Bioinfo Cyborg's WorkLog: May 2008

Monday, May 26, 2008

Voice From China Logo

I made this logo for www.voicefromchina.com

Some voices from the other side of the world, you may want to hear another side of a story.

Cytoscape Plugin Dev Environment

Materials:

After struggling for a couple of hours, I finally got Netbeans setup to do the work!
Netbeans IDE, Cytoscape Source

Setup:
1. Install Netbeans IDE
2. Install Cytoscape
3. Create a custom plugin folder (for example: c:/cytoscape/plugins/dev/)
4. Create a Netbeans java project
5. Add Cytoscape libs to the project. Specifically, cytoscape.jar and all the /libs/ folder
6. Edit the project properties by right click on the project name. Under 'Run' item,
enter (not by browsing) cytoscape.CyMain; in the arguments, add -p 'your plugin folder'
7. Go to the project folder in the file system, in nbproject folder, edit project.properties, change value of dist.jar to your full file path with the jar name.
For example: dist.jar=C:/Program Files (x86)/Cytoscape_v2.6.0/plugins/dev/HelloWorld.jar, note you should not use any quotes!

Now you are done. You can debug your Cytoscape plugin, you set break points, watch...
it's just nice and easy!
------------------------------------------------------------------------------------
Update: you can directly load a sif file:
-N – Load a .sif file
-n – Load node attributes
-e – Load edge attributes

Thanks for help from Maital Ashkenazi

Super Annoying Eclipse Bug

All Eclipsed Based IDEs, will have menu/dialog item missing/malfunction bugs on windows xp64 system. It seems the bug was fixed, then either a java update or eclipse update or something else just restored the bug.

I spent the whole weekend trying to figure this out, no luck.
-------------------------------------------------------------
Updates. It turned out the fix to this bug is nothing related to Java or Eclipse. It's because of the video card driver and ultramon, a software that provides finer control of multi-panel displays.

The funny thing is that by googling, many of the developers who is using a windows xp 64 system also use multi-display, and also use ultramon. So, we all ran into the same issue. And naturally, most of us think it's a problem from Eclipse. For me it's more complicated: I thought it's a problem from Flex Builder when i tried to import Flare libraries into the IDE; then I thought it was a problem from Eclipse, and by upgrading 3.3 to 3.3.1 i unplugged a monitor (it's this coincidence) and fixed the issue, then i thought it was an issue from eclipse because the upgrade solved the problem. then when eclipse upgraded from 3.3.1 to 3.3.2 it's just natural to think that it's the new fix that restored the bug. Then i went back to 3.3.1, the issue was still there, meanwhile i noticed there was an upgrade in JRE, so i thought was an issue from either Java or Eclipse upgrade remainders...

Eventually, it was just a glitch in the Ultramon and Video driver. Uninstall ultramon, just solved the problem, nice and easy.

This reminds me of the biological experiments: it's not easy to establish causal relationships. Given computers and softwares are becoming more and more sophisticated, the incomparability issue simply can arise from anywhere..

Thursday, May 22, 2008

Birthday song in 4 languages

French
Joyeux anniversaire
Joyeux anniversaire
Joyeux anniversaire XXX
Joyeux anniversaire

German
Zum Geburtstag viel Glück,
zum Geburtstag viel Glück,
zum Geburtstag, liebe ,
zum Geburtstag viel Glück.

Italian
Tanti auguri a te,
tanti auguri a te,
tanti auguri (and the name of the person),
tanti auguri a te

Spanish
Cumplea?os Feliz,
Cumplea?os Feliz,
cumplea?os..... ,
cumplea?os felíz!!

I sing for Karen's Bday, adding English and Chinese.

Tuesday, May 20, 2008

Interaction Network Analysis

The preliminary data demonstrate a high consistency between the detected clusters and annotated pathways, and it is very likely that we may utilize this information to complement our current knowledge.

The challenge is:
1. Reproducibility. The cluster detection algorithm is not very robust, by removing 1% of the edges the mis-classification error can go up to around 20%. The consistency between different methods is very low, with ARI(Adjusted Rand Index) less than 0.6.
The challenge is how do we generate a random network, which pertains the degree distribution of a real world network, while still can follow certain cluster size distribution and community structure?

2. Resolution. It has been shown that community detection algorithm based on modularity has resolution issues. This is very relevant to community detection in large networks. In practice, I noticed that some communities do contain smaller functional clusters. Some research has pointed out this issue.
[Ref]

3. Systematic ways to classify generated clusters. Some clusters tend to be pathways while others tend to be protein complexes. Automated methods are required to 'qualify' these ad hoc clusters.

Monday, May 19, 2008

GSoC Preliminary Plans

The coding is expected to begin on May 28th, 2008 and last until Aug 18th, approximately three months. This is the preliminary timeline.

May 19 to 26:
API reading, tryouts. Set up working environment for coding. Familiarize with the debugging environment.

May 28 to July 7:
Working session one, approximately five weeks to mid-term evaluation. Expected goal is a minimal working model of the project.

July 12 to Aug 11:
Approximately one month, to improve performance, documentation, etc.

This project can be divided into two parts:

1) Clustering/Grouping part.
The goal is to determine how to reduce the whole graph into clusters of subgraphs. Depend on the sophistication of the algorithms, it can be very easy to implement(say, let's just randomly cut the entire graph into 10 small clusters with equal sizes), or complicated(implement methods developed by Mark Newman et. al). The output from part 1 is a clustering of the original dataset, possibly represented in an integer array, indexes pointing to node indexes and values refer to cluster memberships.

Potential methods required to manipulate the network data:
1. Find the nearest neighbour of a certain node.
2. Subgraph a graph with an array of node indexes. Build a subgraph with the nodes in the array and all edges connecting these nodes; or give an array of node indexes, find all the edges connecting them.
3. Find number of edges connecting from one cluster to another cluster (inter-cluster edge count).
4. Find number of edges connecting members in the same cluster (intra-cluster edge count).
I wonder whether these methods are available in the Cytoscape package?

2) Visualization part.
The goal is to visualize the entire graph on the basis of clustering information. The model is to simplify large networks by hierarchical organization of nodes. For each cluster, the layout(for example, force directed layout) is calculated independently; the members of a cluster can be collapsed into one single node; Clusters are accounted as 'meta nodes' and their layout is calculated independently as well. If possible, I will try transitioning animations of open/collapsing nodes(I don't know whether it's possible or how difficult it is right now).

So in phase one, the expected outcome is a minimal working example. As the knowledge base about the visualization is far more completed than the knowledge base of clustering algorithm on the Cytoscape platform, if I run difficulty into clustering, we can put more emphasis on the visualization, and write a simple method for part 1(such as assign the graph into 10 equal-size random clusters). Then for phase two, we can first try out by only using force-directed layout, implement the expand/collapsing mechanism and independent calculation of cluster member layouts.

In phase two, I will work to tune up, improve visualization, and make various modifications and documentations.

The remaining dates are for various write ups, feed backs, etc.
The code will be submitted starting from Sep 3rd.

Some additional information can be accessed here.

1001 Nights Video

This video I made two years ago is to record our performance in a Cosplay drama competition, which we won the first price.

The story is quite old: the reaper fell in love with the princess and decided to spare her life. However, by doing so he violates the laws of the reapers. The consequence is he was persecuted by the grand dark lord (which was me).

Video to commemorate victims in recent earthquake

I made this video for those lost their lives in the recent earthquake in Sichuan, China.
Rest In Peace...
to those children, who lost their future forever.

Establish Connection from Perl to Microsoft SQL Server

It’s very easy to use perl DBI to connect to mysql server. It’s a little bit headache for Microsoft SQL Server as the DBD::MSSQL is far less mature than the PHP equivalent. However, as I need to use perl Imager to batch create image tiles, i did some googling to figure out how to do the job.

There are two ways, both work only on Microsoft windows:

1. Use Win32::SqlServer module. This module is not included in ppm repositories so you need to go to their homepage to download from the web:
http://www.sommarskog.se/mssqlperl/index.html
To eastablish a connection and query a sql statement:

use Win32::SqlServer;
my $sqlsrv = sql_init($server, $user, $pass, $database);
my $result = $sqlsrv->sql(’blah blah blah’);

2. Use ODBC data source. You need to go to control panel->administrative tools->Data Sources
Under User DSN tab, hit add
Choose data driver, and configure the ODBC, don’t forget to name it. When it’s all done, we use perl DBI

use DBI;
my $dbh = DBI->connect(”dbi:ODBC:’your odbc name here’”, $user, $pass);
#The user name and password are not needed if you do windows authentication.

When the database handler is established, the rest are the same for all DBI methods. The advantage of ODBC is it can be connected via other languages, such as R or JAVA.

P.S. Found another package, Data::Dumper, pretty easy to dump all the data structure in array or hashes, similar to PHP print_r

Bioinfo Cyborg's WorkLog