Skip to content

Louvain/Leiden Algorithm for Community Detection #447

@Becheler

Description

@Becheler

Hi Graphies,

Current state

If I understand correctly there is currently no method for community detection in BGL (but see clustering methods). Community detection algorithms like Louvain and Leiden automatically infer hidden groupings in very large networks, identifying gene modules, customer segments, fraud rings, or system components without requiring labeled data.
Common methods include:

  • K-core Decomposition
  • Label Propagation
  • the Louvain algorithm (popular, ~30000 citations, some statistical issues)
  • the Leiden algorithm (extends Louvain, solves some issues, more complex, less standard ~ 6000 citations)

Scope

I plan to submit a PR to implement the simplest case of Louvain modularity optimization for undirected graphs only:

  • minimal BGL-style API
  • regression + correctness tests to reproduce validation methods/results in Blondel et al 2008
  • no metric generalization yet

The benefit of starting with Louvain is that it builds requried components for Leiden, that is also more complex.

Image Algorithm figure from Blondel at al. 2008

Litterature and context

Related papers and ressource:

Formalism

Modularity defined as:

$$Q = \frac{1}{2m} \sum_{ij} \left[ A_{ij} - \gamma \frac{k_i k_j}{2m} \right] \delta(c_i, c_j)$$

Where:

  • $m$ is the sum of all edges weights
  • $A$ is the adjacency matrix
  • $\gamma$ is the resolution parameter
  • the $\delta$ function is 1 if $i$ and $j$ are in the same community, else 0

It is defined to be equivalent to the community-level formula used for computation:

$$Q = \sum_{c=1}^{n} \left[ \frac{L_c}{m} - \gamma \left( \frac{k_c}{2m} \right) ^2 \right]$$

Where:

  • $n$ is the number of communities$
  • $L_c$ is the number of intra-community edges for community $c$
  • $k_c$ is the sum of degrees of the vertices in community $c$
  • $\gamma$ is the resolution parameter, often equal to $1$

The optimization efficiency comes from the possibility to compute the gain in modularity when moving a node $i$ from its community $c_i$ to the community of one of its neighbors $c_j$ without recomputing the full graph:

$$\Delta Q = \left[ \frac{\sum_{in} + k_{i,in}}{2m} - \left( \frac{ \sum_{tot} + k_i}{2m} \right)^2 \right] - \left[ \frac{\sum_{in}}{2m} - \left( \frac{ \sum_{tot}}{2m} \right)^2 - \left( \frac{k_i}{2m} \right)^2 \right]$$

Where:

  • $\sum_{in}$ is the sum of the weights of the links inside $C$,
  • $\sum_{tot}$ is the sum of the weights of the links incident to nodes in $C,
  • $k_i$ is the sum of the weights of the links incident to node $i$,
  • $k_{i,in}$ is the sum of the weights of the links from $i$ to nodes in $C$
  • $m$ is the sum of the weights of all the links in the network

Possible implementation

I will just throw a quick idea of the interface just to be sure it fits well wit BGL spirit

template <typename Graph, typename CommunityMap, typename WeightMap>
double compute_modularity(const Graph& g, const CommunityMap& communities, const WeightMap& weights)
{
    // probably needs vertices(), out_edges(), target(), source()
    BOOST_CONCEPT_ASSERT((boost::IncidenceGraphConcept<Graph>));
    BOOST_CONCEPT_ASSERT((boost::VertexListGraphConcept<Graph>));
    
    // directed louvain seems to require modifsm, see https://github.com/nicolasdugue/DirectedLouvain
    typedef typename boost::graph_traits<Graph>::directed_category directed_category;
    BOOST_STATIC_ASSERT_MSG(
        (boost::is_same<directed_category, boost::undirected_tag>::value),
        "compute_modularity requires an undirected graph"
    );
    
    // impl from Blondel 2008 works with weighted graphs
}

template <typename Graph, typename CommunityMap, typename WeightMap>
void louvain_clustering(const Graph& g, CommunityMap& communities, const WeightMap&) // note the write param
{
    BOOST_CONCEPT_ASSERT((boost::IncidenceGraphConcept<Graph>));
    BOOST_CONCEPT_ASSERT((boost::VertexListGraphConcept<Graph>));
    
    typedef typename boost::graph_traits<Graph>::directed_category directed_category;
    BOOST_STATIC_ASSERT_MSG(
        (boost::is_same<directed_category, boost::undirected_tag>::value),
        "louvain_clustering requires an undirected graph."
    );
    
    // hide state from user ?
    std::vector<double> internal_edges_per_community(num_vertices(g));
    std::vector<double> total_degree_per_community(num_vertices(g));
    std::vector<double> neigh_community_weight(num_vertices(g));
    
    // impl

    // return void like djikstra ? or an enriched structured with internal results from the algo for debug ?
}

// unweighted overload
template<typename Graph, typename CommunityMap>
void louvain_clustering(const Graph& g, CommunityMap& communities)
{
   louvain_clustering(g, communities, boost::make_constant_property_map(1.0));
}

// usage:
louvain_clustering(g, comm_map);
double quality = compute_modularity(g, comm_map);

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions