What is BiCluE?
BiCluE is a bi-clustering software for solving the weighted/unweighted bi-cluster editing problem. It provides exact as well as heurstic algorithms.
What is Bi-clustering?
Clustering is a classical task in bioinformatics and computational biology. It partitions a data set into different clusters such that elements within a cluster are more similar to each other than to those objects belonging to different clusters, according to a certain criterion. However, some specific types of clustering were designed for different scenarios. For instance, the clustering of gene expression data under different conditions, which can be modeled as a bipartite graph, is hardly suitable for standard clustering methods. Instead, one would like to cluster genes and conditions simultaneously such that we see a consistent "behavior", i.e. so called bi-clustering.
What is Cluster Editing & Bi-cluster Editing?
One of the common approaches of solving the clustering and bi-clustering problems is to compute a pair-wise similarity matrix and to choose a similarity threshold for constructing the corresponding similarity graphs. Such graphs are built according to the following steps:
(1) the vertices of the graph refer to the objects (for instance, genes or conditions)
(2) an edge between two objects is drawn if the similarity score between two vertices is above a certain threshold.
We call two arbitrary vertices u and v "similar" when the score is above a certain value, written as u~v. However, the resulting graph is not necessarily transitive, meaning for arbitrary three vertices u,v,w, u~v and u~w does not necessarily imply v~w. As a result, we aim to convert the preliminarily constructed graph into a transitive graph, which is a disjoint union of cliques, with minimal costs (minimal number of edge deletions/insertions, for instance). Such a problem is named "cluster editing".
Bi-cluster editing is similar to its counterpart, cluster editing: we transform a given bipartite graph into a union of disjoint bi-cliques by edge insertions and deletions with minimal costs for these modifications.
Bi-Force - Large-scale bicluster editing and its application to gene expression data biclustering
In this section, we introduce a novel way of modeling the problem as combinatorial optimization problem on graphs: Weighted Bi-Cluster Editing. It is a very flexible model that can handle arbitrary kinds of multi-condition data sets (not limited to gene expression).
What is the input format?
Graph inputs
The input format for BiCluE looks as follows:
Vertex_1 | Level_1 | Vertex_2 | Level_2 | Edge_Weight |
---|---|---|---|---|
V1_1 | 1 | V2_1 | 2 | -9.2324 |
V1_2 | 1 | V2_2 | 2 | 10.832 |
V1_1 | 1 | V2_3 | 2 | 15.92 |
The input is a table-delimited file consisting of five columns:
column 1: the id of vertex 1
column 2: the vertex set of vertex 1 (the level of vertex 1)
column 3: the id of vertex 2
column 4: the vertex set of vertex 2 (the level of vertex 2)
column 5: the weight of the edge between them
Matrix inputs
A normal table input should be like this:
1.2 | 2.11 | 3.22 | 1.38 | 2.44 |
---|---|---|---|---|
5.69 | 2.42 | 4.52 | 5.11 | -9.2324 |
2.301 | 5.23 | 6.487 | 2.783 | 10.832 |
3.423 | 1.902 | 3.4092 | 4.5872 | 1.92 |
Example files
Download this test input file to have a test with BiCluE. The test graph is generated by random graph generator.
Example user input
This matrix file is the test fun for BiForce Example user input
How to Run
- Run the script To run the program, use "java -jar BiCluE.jar"
- Arguments
- Example Java Command Line
- To run Bi-Force:
-a : it's a compulsory paramter, it gives the programme what algorithm the user wishes to run.
1: fixed-parameter algorithm
2: edge deletion heuristics
-i : the path of the input file
-o : the output path of the editing results
-g : the output path of the resulting graph
-l : (fixed-parameter algorithm only) the lower bound of the parameter k
-s : (fixed-parameter algorithm only) the step/increase of the parameter k if no solution is found
Fixed-parameter Algorithm:
java -Xms512m -Xmx5G -XX:-UseGCOverheadLimit -jar BiCluE.jar -a 1 -i (Input File Path) -o (Result File Output Path) -g (Resulting Graph Output Path) -l 100 -s 20
Edge Deletion Heuristics:
java -Xms512m -Xmx5G -XX:-UseGCOverheadLimit -jar BiCluE.jar -a 2 -i (Input File Path) -o (Result File Output Path) -g (Resulting Graph Output Path)
For matrices, you need to run the mbiforce.jar file, specifying
(1) the input path
(2) the output path
(3) the "clustering mode"
(4) whether the matrix file has header or not
(5) threshold
Sample:
java -jar mbiforce.jar -i=inputmatrix.txt -o=output.txt -m=o -h=false -t=0.0
-i stands for "input". Note that there is no space between "-i" and "=",
and the input path.
-o stands for "output",
-m stands for the "clustering mode". There are four mode available:
"o" stands for "overexpressed", which is to cluster the region in the
matrix with significantly higher value than the given "threshold".
"u" stands for "underexpressed", which is the opposite to "o".
"l" stands for "lowerdeviated", which is to cluster the elements in the
matrix not deviated from the given "threshold".
"h" stands for "highdeviated", which is to cluster the matrix elements
far from the given threshold.
-h stands for "hasHeader", please use "false" or "true"
-t stands for "threshold"
There are two sample inputs in the "./test/" folder.
Download
Here is the link to download biclue.jar:
BiCluE Download
Here is the linkd to download biforce.jar:
BiForce Download
Citation
Sun, P., Speicher, N. K., Roettger, R., Guo, J., & Baumbach, J. (2014). Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering. Nucleic acids research, gku201.