What is BiCluE?


BiCluE is a bi-clustering software for solving the weighted/unweighted bi-cluster editing problem. It provides exact as well as heurstic algorithms.

What is Bi-clustering?


Clustering is a classical task in bioinformatics and computational biology. It partitions a data set into different clusters such that elements within a cluster are more similar to each other than to those objects belonging to different clusters, according to a certain criterion. However, some specific types of clustering were designed for different scenarios. For instance, the clustering of gene expression data under different conditions, which can be modeled as a bipartite graph, is hardly suitable for standard clustering methods. Instead, one would like to cluster genes and conditions simultaneously such that we see a consistent "behavior", i.e. so called bi-clustering.

What is Cluster Editing & Bi-cluster Editing?


One of the common approaches of solving the clustering and bi-clustering problems is to compute a pair-wise similarity matrix and to choose a similarity threshold for constructing the corresponding similarity graphs. Such graphs are built according to the following steps:

(1) the vertices of the graph refer to the objects (for instance, genes or conditions)
(2) an edge between two objects is drawn if the similarity score between two vertices is above a certain threshold.

We call two arbitrary vertices u and v "similar" when the score is above a certain value, written as u~v. However, the resulting graph is not necessarily transitive, meaning for arbitrary three vertices u,v,w, u~v and u~w does not necessarily imply v~w. As a result, we aim to convert the preliminarily constructed graph into a transitive graph, which is a disjoint union of cliques, with minimal costs (minimal number of edge deletions/insertions, for instance). Such a problem is named "cluster editing".

Bi-cluster editing is similar to its counterpart, cluster editing: we transform a given bipartite graph into a union of disjoint bi-cliques by edge insertions and deletions with minimal costs for these modifications.



Bi-Force - Large-scale bicluster editing and its application to gene expression data biclustering


In this section, we introduce a novel way of modeling the problem as combinatorial optimization problem on graphs: Weighted Bi-Cluster Editing. It is a very flexible model that can handle arbitrary kinds of multi-condition data sets (not limited to gene expression).

What is the input format?


Graph inputs


The input format for BiCluE looks as follows:

Vertex_1 Level_1 Vertex_2 Level_2 Edge_Weight
V1_1 1 V2_1 2 -9.2324
V1_2 1 V2_2 2 10.832
V1_1 1 V2_3 2 15.92

The input is a table-delimited file consisting of five columns:
column 1: the id of vertex 1
column 2: the vertex set of vertex 1 (the level of vertex 1)
column 3: the id of vertex 2
column 4: the vertex set of vertex 2 (the level of vertex 2)
column 5: the weight of the edge between them


Matrix inputs


A normal table input should be like this:
1.2 2.11 3.22 1.38 2.44
5.69 2.42 4.52 5.11 -9.2324
2.301 5.23 6.487 2.783 10.832
3.423 1.902 3.4092 4.5872 1.92

Example files


Download this test input file to have a test with BiCluE. The test graph is generated by random graph generator.

Example user input

This matrix file is the test fun for BiForce Example user input

How to Run

  1. Run the script
  2. To run the program, use "java -jar BiCluE.jar"

  3. Arguments

  4. -a : it's a compulsory paramter, it gives the programme what algorithm the user wishes to run.
     1: fixed-parameter algorithm
     2: edge deletion heuristics
    -i : the path of the input file
    -o : the output path of the editing results
    -g : the output path of the resulting graph
    -l : (fixed-parameter algorithm only) the lower bound of the parameter k
    -s : (fixed-parameter algorithm only) the step/increase of the parameter k if no solution is found

  5. Example Java Command Line

  6. Fixed-parameter Algorithm:
    java -Xms512m -Xmx5G -XX:-UseGCOverheadLimit -jar BiCluE.jar -a 1 -i (Input File Path) -o (Result File Output Path) -g (Resulting Graph Output Path) -l 100 -s 20

    Edge Deletion Heuristics:
    java -Xms512m -Xmx5G -XX:-UseGCOverheadLimit -jar BiCluE.jar -a 2 -i (Input File Path) -o (Result File Output Path) -g (Resulting Graph Output Path)


  7. To run Bi-Force:

  8. For matrices, you need to run the mbiforce.jar file, specifying
    (1) the input path
    (2) the output path
    (3) the "clustering mode"
    (4) whether the matrix file has header or not
    (5) threshold


    Sample:

    java -jar mbiforce.jar -i=inputmatrix.txt -o=output.txt -m=o -h=false -t=0.0

    -i stands for "input". Note that there is no space between "-i" and "=",
    and the input path.

    -o stands for "output",

    -m stands for the "clustering mode". There are four mode available:
    "o" stands for "overexpressed", which is to cluster the region in the
    matrix with significantly higher value than the given "threshold".
    "u" stands for "underexpressed", which is the opposite to "o".
    "l" stands for "lowerdeviated", which is to cluster the elements in the
    matrix not deviated from the given "threshold".
    "h" stands for "highdeviated", which is to cluster the matrix elements
    far from the given threshold.

    -h stands for "hasHeader", please use "false" or "true"

    -t stands for "threshold"


    There are two sample inputs in the "./test/" folder.

Download


Here is the link to download biclue.jar:

BiCluE Download


Here is the linkd to download biforce.jar:

BiForce Download

Citation


Sun, P., Speicher, N. K., Roettger, R., Guo, J., & Baumbach, J. (2014). Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering. Nucleic acids research, gku201.