-
Notifications
You must be signed in to change notification settings - Fork 229
Description
Hi Graphies,
Current state
If I understand correctly there is currently no method for community detection in BGL (but see clustering methods). Community detection algorithms like Louvain and Leiden automatically infer hidden groupings in very large networks, identifying gene modules, customer segments, fraud rings, or system components without requiring labeled data.
Common methods include:
- K-core Decomposition
- Label Propagation
- the Louvain algorithm (popular, ~30000 citations, some statistical issues)
- the Leiden algorithm (extends Louvain, solves some issues, more complex, less standard ~ 6000 citations)
Scope
I plan to submit a PR to implement the simplest case of Louvain modularity optimization for undirected graphs only:
- minimal BGL-style API
- regression + correctness tests to reproduce validation methods/results in Blondel et al 2008
- no metric generalization yet
The benefit of starting with Louvain is that it builds requried components for Leiden, that is also more complex.
Algorithm figure from Blondel at al. 2008
Litterature and context
Related papers and ressource:
- Newman and Girvan, 2004: paper for modularity metrics, with per-community version too
- Blondel et al, 2008: Louvain foundational paper using modularity metrics
- Campigotto et al, 2014: Generalizing Louvain to multiple quality metrics + C++ implementation using Strategy pattern (see reference 4)
- Traag et al, 2019 : Leiden paper + Louvain background
- Louvain Method CS class video
- Comparative evaluation of community detection methods : methods, performances, generated community sizes distributions
Formalism
Modularity defined as:
Where:
-
$m$ is the sum of all edges weights -
$A$ is the adjacency matrix -
$\gamma$ is the resolution parameter - the
$\delta$ function is 1 if$i$ and$j$ are in the same community, else 0
It is defined to be equivalent to the community-level formula used for computation:
Where:
-
$n$ is the number of communities$ -
$L_c$ is the number of intra-community edges for community$c$ -
$k_c$ is the sum of degrees of the vertices in community$c$ -
$\gamma$ is the resolution parameter, often equal to$1$
The optimization efficiency comes from the possibility to compute the gain in modularity when moving a node
Where:
-
$\sum_{in}$ is the sum of the weights of the links inside$C$ , -
$\sum_{tot}$ is the sum of the weights of the links incident to nodes in $C, -
$k_i$ is the sum of the weights of the links incident to node$i$ , -
$k_{i,in}$ is the sum of the weights of the links from$i$ to nodes in$C$ -
$m$ is the sum of the weights of all the links in the network
Possible implementation
I will just throw a quick idea of the interface just to be sure it fits well wit BGL spirit
template <typename Graph, typename CommunityMap, typename WeightMap>
double compute_modularity(const Graph& g, const CommunityMap& communities, const WeightMap& weights)
{
// probably needs vertices(), out_edges(), target(), source()
BOOST_CONCEPT_ASSERT((boost::IncidenceGraphConcept<Graph>));
BOOST_CONCEPT_ASSERT((boost::VertexListGraphConcept<Graph>));
// directed louvain seems to require modifsm, see https://github.com/nicolasdugue/DirectedLouvain
typedef typename boost::graph_traits<Graph>::directed_category directed_category;
BOOST_STATIC_ASSERT_MSG(
(boost::is_same<directed_category, boost::undirected_tag>::value),
"compute_modularity requires an undirected graph"
);
// impl from Blondel 2008 works with weighted graphs
}
template <typename Graph, typename CommunityMap, typename WeightMap>
void louvain_clustering(const Graph& g, CommunityMap& communities, const WeightMap&) // note the write param
{
BOOST_CONCEPT_ASSERT((boost::IncidenceGraphConcept<Graph>));
BOOST_CONCEPT_ASSERT((boost::VertexListGraphConcept<Graph>));
typedef typename boost::graph_traits<Graph>::directed_category directed_category;
BOOST_STATIC_ASSERT_MSG(
(boost::is_same<directed_category, boost::undirected_tag>::value),
"louvain_clustering requires an undirected graph."
);
// hide state from user ?
std::vector<double> internal_edges_per_community(num_vertices(g));
std::vector<double> total_degree_per_community(num_vertices(g));
std::vector<double> neigh_community_weight(num_vertices(g));
// impl
// return void like djikstra ? or an enriched structured with internal results from the algo for debug ?
}
// unweighted overload
template<typename Graph, typename CommunityMap>
void louvain_clustering(const Graph& g, CommunityMap& communities)
{
louvain_clustering(g, communities, boost::make_constant_property_map(1.0));
}
// usage:
louvain_clustering(g, comm_map);
double quality = compute_modularity(g, comm_map);