Tuesday, May 20, 2008

Interaction Network Analysis

The preliminary data demonstrate a high consistency between the detected clusters and annotated pathways, and it is very likely that we may utilize this information to complement our current knowledge.

The challenge is:
1. Reproducibility. The cluster detection algorithm is not very robust, by removing 1% of the edges the mis-classification error can go up to around 20%. The consistency between different methods is very low, with ARI(Adjusted Rand Index) less than 0.6.
The challenge is how do we generate a random network, which pertains the degree distribution of a real world network, while still can follow certain cluster size distribution and community structure?

2. Resolution. It has been shown that community detection algorithm based on modularity has resolution issues. This is very relevant to community detection in large networks. In practice, I noticed that some communities do contain smaller functional clusters. Some research has pointed out this issue.
[Ref]

3. Systematic ways to classify generated clusters. Some clusters tend to be pathways while others tend to be protein complexes. Automated methods are required to 'qualify' these ad hoc clusters.

No comments: