Monday, May 19, 2008

GSoC Preliminary Plans

The coding is expected to begin on May 28th, 2008 and last until Aug 18th, approximately three months. This is the preliminary timeline.

May 19 to 26:
API reading, tryouts. Set up working environment for coding. Familiarize with the debugging environment.

May 28 to July 7:
Working session one, approximately five weeks to mid-term evaluation. Expected goal is a minimal working model of the project.

July 12 to Aug 11:
Approximately one month, to improve performance, documentation, etc.



This project can be divided into two parts:

1) Clustering/Grouping part.
The goal is to determine how to reduce the whole graph into clusters of subgraphs. Depend on the sophistication of the algorithms, it can be very easy to implement(say, let's just randomly cut the entire graph into 10 small clusters with equal sizes), or complicated(implement methods developed by Mark Newman et. al). The output from part 1 is a clustering of the original dataset, possibly represented in an integer array, indexes pointing to node indexes and values refer to cluster memberships.

Potential methods required to manipulate the network data:
1. Find the nearest neighbour of a certain node.
2. Subgraph a graph with an array of node indexes. Build a subgraph with the nodes in the array and all edges connecting these nodes; or give an array of node indexes, find all the edges connecting them.
3. Find number of edges connecting from one cluster to another cluster (inter-cluster edge count).
4. Find number of edges connecting members in the same cluster (intra-cluster edge count).
I wonder whether these methods are available in the Cytoscape package?

2) Visualization part.
The goal is to visualize the entire graph on the basis of clustering information. The model is to simplify large networks by hierarchical organization of nodes. For each cluster, the layout(for example, force directed layout) is calculated independently; the members of a cluster can be collapsed into one single node; Clusters are accounted as 'meta nodes' and their layout is calculated independently as well. If possible, I will try transitioning animations of open/collapsing nodes(I don't know whether it's possible or how difficult it is right now).

So in phase one, the expected outcome is a minimal working example. As the knowledge base about the visualization is far more completed than the knowledge base of clustering algorithm on the Cytoscape platform, if I run difficulty into clustering, we can put more emphasis on the visualization, and write a simple method for part 1(such as assign the graph into 10 equal-size random clusters). Then for phase two, we can first try out by only using force-directed layout, implement the expand/collapsing mechanism and independent calculation of cluster member layouts.

In phase two, I will work to tune up, improve visualization, and make various modifications and documentations.

The remaining dates are for various write ups, feed backs, etc.
The code will be submitted starting from Sep 3rd.

Some additional information can be accessed here.

No comments: