Measuring Network Dependencies from Node Activations

Graduate Thesis Defense
dissertation.rtbs.dev

Rachael T.B. Sexton

NIST

Background

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Imagine…

You work at an institution where lots of people author publications…

…sometimes together…

Co-authorship \(\rightarrow\) relationships of colleagues?

Build a network!

How good of a job did you do?

  • There’s no “ground truth” network…
  • What does “related” mean?
  • How will you trust your answer?

Problem & Scope

Assume
We want networks of dependence (influence, causation, etc.)
Research Question
Can we recover dependency relationships from co-occurrences?



Conditions

  • Without forcing the (social, etc.) network to have a certain shape?
  • While retaining easily-interpretable metrological properties?
  • Fast and flexible enough for practitioners: rapid iteration and exploratory analysis?

This work:

  1. Organizes current approaches to identify metrological and useability gaps.
  2. Presents Forest Pursuit: method to recover dependencies that is scalable and accurate .
  3. Provides MENDR, a community testbench and dataset for comparing network dependency reconstruction.
  4. Application to common case studies.

A Need for Network Metrology

[…] the practice of ignoring measurement error is still mainstream, and robust methods to take it into account are underdeveloped.

Tiago Peixoto (2018) [1]

Surprisingly, the development of theory and domain-specific applications often occur in isolation, risking an effective disconnect between theoretical and methodological advances and the way network science is employed in practice.

Peel et al. (2022) [2]

Network “Metrology” – What is it?

Metrology is more than just “units”. In our context, we want to:

  • Quantify a network
  • Consider the trueness of that quantification
  • Consider the precision of that quantification.

ISO 5725-1 Accuracy (trueness & precision) [3]

SV1XV, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

Example: measurement Error for Zachary’s Karate Club

Trueness
  • Actual “edges” are provided as a list [4]
  • What they mean (quantitatively) is ambiguous, so it’s underspecified.
Precision
  • edge 78 simultaneously exists and doesn’t, depending on which section you believe
  • Any false negatives? No way to know…
Figure 1: Zachary’s Karate Club, with ambiguously extant edge 78 highlighted.

It’s effectively a “calibration artefact” \(\rightarrow\) defined as true, with some precision uncertainty.

Indirect Network Measurement

Edges are often measured indirectlyi.e. lists of co-occurrences (bipartite papers+authors)

Figure 2: Observations as activation sets

What if…

  • co-occurrence \(\rightarrow\) “acquaintance”?
  • more papers \(\rightarrow\) more acquainted?

So let’s do a bipartite projection…

…done?

Figure 3: Network based solely on co-authorship observations

Dependencies Wanted

Have you ever had someone added to a paper by someone else?

Almost all paper authors were “asked” by one current author (not cornered by a mob).

A social scientist “on the ground” would likely measure things differently.

  • Author (g) asked (e, h, and c) to co-author a paper, all of whom agreed
  • Author (i) asked (f and j), but (j) wanted to add (b)’s expertise before writing one of the sections.
  • etc.
Figure 4: graph of mutual collaboration relationships i.e. the “ground truth” social network

This is a network of who depends on who!

The Goal

Figure 5: Recovering underlying dependency networks from node-cooccurrences.

Roads to Network Recovery

Organizing current approaches

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Our Data, the Design Matrix

  • Encode “papers” (observations) as rows, “authors” (features) as columns

\[ \small X^{m\times n}= %X(\{1,2,3,4,\cdots\})=\\ \begin{array}{c c} & \begin{array}{cccccccccc} a & b & c & d & e & f & g & h & i & j\\ \end{array} \\ \begin{array}{c c } x_1\\x_2\\x_3\\x_4\\ \vdots \\ x_m\end{array} & \left[ \begin{array}{c c c c c c c c c c} . & . & 1 & . & 1 & . & 1 & 1 & . & . \\ 1 & . & . & . & 1 & 1 & . & . & . & . \\ . & 1 & . & . & . & 1 & . & . & 1 & 1 \\ . & . & . & 1 & 1 & . & . & 1 & . & . \\ &&&& \vdots &&&&& \\ &&&& \vdots &&&&& \end{array} \right] \end{array} \]

(a) : \(X\) as a (bi-adjacency) matrix.
(b) Bipartite representation of node “activation” data
Figure 6

Graphs as Incidence Matrices

\[ \small B^{\omega \times n} = \begin{array}{c c} & \begin{array}{cccccccccc} a & b & c & d & e & f & g & h & i & j\\ \end{array} \\ \begin{array}{c c } e_1\\e_2\\e_3\\e_4\\e_5\\e_6\\e_7\\e_8\\e_9\\e_{10}\\e_{11}\\ \vdots \\e_\omega\\\end{array} & \left[ \begin{array}{cccccccccc} 1 & . & . & . & . & 1 & . & . & . & . \\ 1 & . & . & . & 1 & . & . & . & . & . \\ . & . & . & . & 1 & 1 & . & . & . & . \\ . & . & . & . & . & 1 & . & . & 1 & . \\ . & . & . & . & 1 & . & 1 & . & . & . \\ . & . & . & . & 1 & . & . & 1 & . & . \\ . & 1 & . & . & . & . & . & . & . & 1 \\ . & . & . & . & . & . & . & . & 1 & 1 \\ . & . & 1 & . & . & . & 1 & . & . & . \\ . & . & . & . & . & . & 1 & 1 & . & . \\ . & . & . & 1 & . & . & . & 1 & . & . \\ &&&& \vdots &&&&& \\ &&&& \vdots &&&&& \end{array} \right] \end{array} \]

(a) : Network \(G\) as an incidence matrix \(B\).
(b) graph of mutual collaboration relationships i.e. the “ground truth” social network
Figure 7

The Goal

\[ \hat{B} \leftarrow f(X)\]

Figure 8: Recovering underlying dependency networks from co-occurrence data.

Basics – incidence as (conditional) dependence

A start: what does “incident” mean, statistically? \(\rightarrow\) Markov Random Fields

  • random variables \(A,B\in J\)
  • remaining columns \(C=X(\cdot,J\setminus \{A,B\})\)

\[ P(A\cap B |C ) = P(A|C)P(B|C) \implies (A\perp B|C) \implies (A,B)\notin G \]

Figure 9: Spring system as a network of conditional dependencies

Taxonomy of Modeling Assumptions – I

Structural Constraint Assumptions

Local Structure/additivity

  • (Linear) combinations of observations
  • Generally called association measures (Cosine Sim./Hyperbolic Projection/Mutual Info.)
  • Built on (marginal) counts and reweighting schemes

Global Structure Constraints

  • Whole network belongs to a class of graphs (or PGMs).
  • Tree/block (MRF/Chow-Liu/GLASSO), Stochastic Block Model, etc.;

Resource and Information Flow

  • Limit total network activation a shared pool of “stuff”
  • “Stuff” gets distributed somehow.
  • E.g. Optimal Transport (Doubly Stochastic/Sinkhorn eOT)Resource Allocation, Geodesics (HSS)

Related to partial pooling in hierarchical bayes, sitting between unpooled and fully pooled priors.

Taxonomy of Modeling Assumptions – II

Data Sampling Assumption: model vs. data

Data is noisy, warped, etc. …what makes it that way?

Forward map (model \(\rightarrow\) data)
\(\mathbf{x} = F(\mathbf{p})\qquad F:\mathbb{R}^{l}\rightarrow \mathbb{R}^n\)

Given the data space observations, we would need the inverse map \(F^{-1}(\mathbf{x})\),

uncertainty is related to precision and trueness.

If we have model space observations (data assumed to match theoretical process),

uncertainty is in precision only.

  • Are we already in model-space, by assumption?
  • Are we solving an inverse problem?

Taxonomy of Modeling Assumptions – III

Data Availability: observations vs. projection

Does our method start with the full bipartite dataset?
Or is it post-processing from marginal counts alone?

(a) Bipartite representation of node “activation” data
(b) Network based solely on co-authorship observations
Figure 10

Organizing Recovery Methods

Table 1: Organizing recovery methods by representation space and level
(a)

A “Path” Forward

We would like a method with:

  • Additive/local assumptions (ease of computation, interpretation)
  • Data-space assumptions (inverse problem)
  • Full usage of bipartite observations (uncertainty quantification)

Latent Graphs with Desire Paths

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Gambit of the Group

Can we use co-occurrence counts for association measures? Are we:

[…] interested primarily in whether people know one another, and collaboration on any topic is a reasonable indicator of acquaintance.

Newman & Girvan (2004)

…i.e. assuming model space observations?

If we’re not so explicit, are we:

“So, consiously or unconsciously, many ethnologists studying social organization make what might be called the ‘gambit of the group’: they assume that animals which are clustered […] are interacting with one another and then use membership of the same cluster […] to define association.”

Whitehead & Dufault (1999) [5]

“Group-based methods” later cited as possible reason for bias in assortativity in social networks [6].

Gambit of the Inner Product

The Gram matrix of co-occurrence underlies so many methods, but what is is saying?

Matrix Multiplication as sum of outer products
\[ G(j,j') = X^T X = \sum_{i=1}^m X(i,j)X(i,j')= \sum_{i=1}^m \mathbf{x}_i\mathbf{x}_i^T \]
  • Creates \(m\times m\) matrices
  • Have a 1 in every \(j,j'\) entry where nodes \(j,j'\) both occurred
  • Implicitly asserts that each observation is a complete graph

a sum of cliques \(\rightarrow\) clique bias

Figure 11: Gram matrix as sum of observation outer products
(a) Observations as activation sets
(b) Edge Measurements with Group Gambit (BoW) assumption
Figure 12: Inner-product projections as sums of cliques illustrating “clique bias”.

Does this make sense?

This framing lets practitioners do a “sanity-check”

Do clique observations make sense for my community?

From a scaling standpoint:

  • a “planar” community (office layout?) adds ~3 relations per author, per paper;
  • a tree community (organizational structure?) adds ~1 relation
  • cliques add relationships quadratically w.r.t. authors!

So…
Bigger papers have quadratically more information about connectivity in our social network?

Don’t bigger papers tell us less about who “knows” who?

Networks as Unions of Subgraphs

IDEA: why limit ourselves to cliques?

  • Map an operation onto each observation
  • Reduce to an aggregate edge guess over all observations

Graph Observations as Vectors

Recall

If we were a social scientist “on the ground”:

  • Author (g) asked (e), (h), and (c) to co-author a paper, all of whom agreed
  • (i) asked (f) and (j), but (j) wanted to add (b)’s expertise before writing one of the sections
  • …etc.
(a) (hypothetical) edge-based observations

\[ \small %X(\{1,2,3,4,\cdots\})=\\ R^{n\times\omega}= \begin{array}{c c}\small & \begin{array}{ccccccccccc} e_1 & e_2 & e_3 & e_4 & e_5 & e_6 & e_7 & e_8 & e_9 & e_{10} & e_{11}\\ \end{array} \\ \begin{array}{c c } x_1\\x_2\\ \vdots \end{array} & \left[ \begin{array}{ccccccccccc} \ .\; & \;. \; & \;. \; & \;. \; & \;1 \; & \;. \; & \;. \; & \;. \; & \;1 \; & \;1 \; & . \ \\ \ .\; & \;. \; & \;. \; & \;1 \; & \;. \; & \;. \; & \;1 \; & \;1 \; & \;. \; & \;. \; & . \ \\ &&&& \vdots &&&&& \end{array} \right] \end{array} \]

(b) : Observations embedded in “edge space”
Figure 13

Unions of Constrained Subgraphs

Idea:

  • Use domain knowledge to estimate (\(\mathbf{r}_i\))
  • Dependencies: constrain each paper to it’s own distribution of dependencies.
  • Additive: use marginal counts of \(R\) rather than \(X\)

Principled way to “sum” over \(\mathbf{x}_i\)?

dankeck, CC0, via Wikimedia Commons

Latent graphs as Desire Path Density Estimates

Complex Networks as latent models. [2], [7]
\(\rightarrow\) We can’t know if a link is “real”, because “real” is hard to define!

  • Is a “friendship” true/false?
  • Is a colleague’s influence boolean?

Instead: “would some expert reasonably answer YES vs NO?”

Alternate framing: would paving a desire path make more sense than not paving?

  • pick a class of graphs to constrain dependencies
  • each observation treads on certain edges.
  • estimate probability that paving (\(r=1\)) is more likely than not paving (\(r=0\))

DPDE graph reduction: probability of “paving”

  • Assume: edge “paving” given “treading” history is iid.
  • Use Beta-Bernoulli on marginal counts \(\sum_i \mathbf{r}_i\)

What prior?

Presented with two equal-length desire paths:

  • Will choose one tread more often before (the “more beaten” path).
  • Nearly always, even for small difference \(\rightarrow\) bathtub distribution.
  • coin-tossing game: “fraction of time ahead\(\rightarrow\) arcsine

\[ \begin{gathered} \pi_e \sim \text{Beta}(\alpha, 1-\alpha), e\in E\\ E \sim \text{Bernoulli}(\pi_e), e \in E \end{gathered} \]

what class of subgraphs do we choose for \(\mathbf{r}_i\)?

Trees

Forest Pursuit

Recovery from random walks in near-linear time

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Random walk activations

Random walk model of node activation:

Visiting a node leads to its activation

  • Spreading and diffusive processes
  • Infectious disease, cognitive search, information cascades, etc.

In our case two pieces of information would have been censored:

  • Individual jumps hidden (only visited nodes)
  • order of visits hidden (only binary activation)

What do we know?

Random walk dependencies are trees

Assume:

  • Single-cause (only one colleague’s request preceded your joining)
  • Single-root (only one colleague originated the “asking”)

The global network might not be a tree, but …

The activation dependencies for each random walk must be.

  • Each new activation comes from a single activated node
  • Dependencies are connected with no loops

Spanning Forest Distribution - Regularized Laplacian

Constrain papers \(\mathbf{r}_i\) to be dependency trees

Spanning forest
a collection of disjointed trees covering every vertex

In our model, being on the same paper is being on the same tree in a forest.

Figure 14: (hypothetical) edge-based observations
Figure 15: Edge Measurements with true (tree) dependencies known

Sparse Approximation

\[\mathbf{r}_i \gets f(\mathbf{x}_i)\]

What are we doing here? … Sparse Approximation?

\[ \mathbf{\hat{r}} = \operatorname*{argmin}_{\mathbf{r}}{\|\mathbf{x}-B^T\mathbf{r} \|_2^2} \quad \text{s.t.} \quad \|\mathbf{r}\|_0 = \|\mathbf{x}\|_0 - 1 \qquad(1)\]

Usually solve with matching pursuit[10].

Each iteration selects the atom with the largest inner product \(\langle \mathbf{b}_{i'},\mathbf{x}\rangle\).

But this doesn’t work (in the standard plus-times ring, see tropical factorization [11]).

\(B^TR\) has the shape and sparsity of \(X\), where each entry is the node’s degree in its tree.

Forest Pursuit – Sums of Steiner Trees

Instead, we will take an Empirical Bayes approach:

\[ \mathbf{\hat{r}} = \operatorname*{argmax}_{\mathbf{r}}{\mathcal{L}(\mathbf{r}|\mathbf{x})} \quad \text{s.t.}\quad \mathbf{r}\sim \mathcal{T}(G^*[\mathbf{x}]) \]

  • Want: point estimate for most likely tree (per paper)
  • Use an empirical prior as shrinkage for a Max. Likelihood estimate.
  • “Number of times authors were on the same tree”

Our point estimate is the mode of the spanning tree distribution on activated nodes
…i.e. the Maximum Spanning Tree (MST)

Prim’s: each iteration selects the edge with the smallest distance to the current tree

  • Essentially the Chow-Liu tree for each observation (Technically a “Steiner Tree”[12])
  • \(\rightarrow\) use Matrix Forest Theorem [13], [14], [15] to link \(X^TX\) to a metric closure of the dependency graph.

does it work?

MENDR

Measurement Error in Network Dependency Reconstruction

Simulation Study & Test-bench

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Synthetic Dataset

Community tooling for community problem-solving

  • Open dataset, testbench, and python library
  • Facilitates consistent referencing of challenge problems, and execution of methods
  • Reproducible via Data Version Control (import as a dependency anywhere!)
  • Easily extensible: pull-request on Github can add new methods or graphs/datasets.
Table 2: Experiment Settings (MENDR Dataset)
parameters values
random graph kind Tree, Block, BA\((m\in\{1,2\}\))
network \(n\)-nodes 10,30,100,300
random walks 1 sample \(m\sim\text{NegBinomial}(2,\tfrac{1}{n})+10\)
random walk jumps 1 sample \(j\sim\text{Geometric}(\tfrac{1}{n})+5\)
random walk root 1 sample \(n_0 \sim \text{Multinomial}(\textbf{n},1)\)
random seed 1, 2, … , 30

Compared Methods

Table 3: Summary of algorithms compared
Algorithm abbrev. \(\alpha\)? class source
Forest Pursuit FP Yes local -
GLASSO GL No global [16], [17]
Ochiai Coef. CS Yes local [18]
Hyperbolic Projection HYP No local [19]
Doubly-Stochastic eOT Yes resource [20], [21]
High-Salience Skeleton HSS Yes resource [22]
Resource Allocation RP Yes resource [23]
Metrics:
  • Expectation over thresholds \(E[\text{score}]\)
    • Matthew’s Correlation Coefficient (MCC)
    • Fowlkes-Mallows (F-M)
  • Average Precision Score (APS)

Results – overall

Figure 16: Comparison of MENDR recovery scores
  • FP is the only approach with greater than 0.5 expected score over all thresholds!
  • By a large margin…
  • GLASSO has a good APS (but so does plain Cosine Similarity?)
  • By graph-type, better than GLASSO even where it is known to be accurate (trees, blocks)

Results – partial residuals

How does “more data” impact the network estimate quality?

Because of how MENDR samples random walks, we need to de-correlate those trends

Figure 19: Partial Residuals (regression on E[MCC])
  • CS Struggles with longer walks (more activations); FP unaffected.
  • GLASSO & FP (less-so) struggle as network size gets bigger (more edges).
  • FP benefits the most from more observations.

Runtime Performance

How fast is it?

  • FP consistently at the lower-bound of GLASSO runtime
  • 1 - 3 orders of magnitude faster
  • Ill-conditioned matrices lead to many convergence failures for GLASSO
Figure 20: Runtime Scaling (Forest-Pursuit vs GLASSO)

Runtime Performance – partial residuals

Theoretically, Forest Pursuit is:

  • Linear (and parallel!) in observation count \(n\)
  • Linear in expected walk-length \(\|\textbf{x}_i\|_0\) (row-density) via Prim’s
  • Constant in network size (given walk-length, i.e. diffusion-rate is known)

Empirically:

Figure 21: Partial Residuals (regression on computation time)

Modifications to Forest Pursuit

Re-weight by Interaction-rate (FPi)

  • Scale edge probability by co-occurrence rate
  • APS differential w/GLASSO is gone
  • Sacrifices stability, and \(E[\text{score}]\)

Expected Forest Maximization (EFM)

  • Generative/Bayesian Formulation
  • Root node spanning trees
  • E-M Iteration: (very) slight improvement, high compute cost

See full document for more detail :)

Case Study

Les Misérables

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Les Misérables Character Network

Network of character co-occurrence by chapter.

  • What are we trying to measure? What characters influence the appearance of others?
  • How is network topology (and derived metrics) affected?
  • Does correcting for clique bias imply correction for other biases?

Les Misérables – Co-Occurrence

Figure 22: Les Miserables character co-occurrence network

Les Misérables – Co-Forest Pursuit

Figure 23: Les Miserables character dependency network (Forest Pursuit)

Les Misérables – Centrality Ranks

Figure 24: Changes in character centrality ranking for FP vs co-occurrence

Les Misérables – Summary

  • Centrality for Fantine, Eponine, and Cosette much higher with FP

  • Why were they so low in the first place? See:

    • “Bechdel Test”[24]
    • Systemic agency reduction in fiction[25].

Estimating character importance despite systematic interaction-reduction bias from authors.

It’s still easier to explain the occurrence of many characters because of key female characters, despite their overall lower co-occurrence.

Case Study

Animal Concept Map

Verbal Fluency Networks

Study participants list out members of some category as fast as possible in a time limit:

Animals
S.1 — dog, cat, sheep, horse, fox, lion, giraffe, rhino…
S.2 — cat, dog, wolf, coyote, elk, deer, ox, bison, cow, sheep…
S.3 — dog, cat, bird, hawk, owl, pidgeon, rat, crocodile, lizard…

Based only on these lists, and the proximity between the words in them, let’s try to make a “map” of animals according to people’s associations in memory.

  • Using windowed co-occurrences to reflect working memory limitations [26]

  • How efficiently can we represent these networks? (range between tree and complete-graph!)

  • How diverse are edge types when investigation the ways humans connect animals?

For each of the following, assume filtering of edges until the graph would otherwise be disconnected (unless specified otherwise).

Figure 25

Fluency Animals – Doubly-Stochastic

Not very efficient; main connectivity/community is context/location-based.

Figure 26: Verbal Fluency (animals) Network Backbone (Doubly-Stochastic)

Fluency Animals – Chow Liu Tree

Figure 27: Verbal Fluency (animals) Dependency Network (Chow-Liu Tree)

Fluency Animals – GLASSO

Figure 28: Verbal Fluency (animals) Dependency Network (GLASSO)

Fluency Animals – Forest Pursuit

Figure 29: Verbal Fluency (animals) Dependency Network (Forest Pursuit)

Fluency Animals – Summary

FP is by far the more efficient in edge usage while maintaining connectivity.

An analyst would also have an easier job of creating “edge-type inventories”, making the FP backbone an excellent exploratory assistant: animals can be related because they are:

  • Co-located
  • Taxonomically similar (cheetah+leopard)
  • Famous predator/prey pairs (cat+mouse)
  • Pop-culture references (lion\(\rightarrow\)tiger\(\rightarrow\)bear)1
  • Similar in ecological niche/role (koala+sloth)
  • Related through conservation or public awareness (panda+gorilla)
  • etc.

Fluency Animals – Importances

Figure 30: Changes in animal centrality ranking for FP vs co-occurrence,GLASSO

Fluency Animals – Structure Preservation

(a) co-occurrence retains local communities at the cost of global structure
(b) clique-bias correction preserves central structure by disconnecting rare nodes
Figure 31: Differences in structural preservation with over-thresholding.

Conclusion

  1. Background
  2. Organizing Current Approaches
  3. Desire Path Density Networks
  4. Forest Pursuit
  5. Experimental Comparison
  6. Case Studies
  7. Conclusion

Thesis Contributions

Taxonomy
Novel organization of modeling assumptions for network inference to assess theoretical gaps.
Sparse Approximation
Recontextualize network recovery in edge space using Desire Path Density framework.
Forest Pursuit
Scalable and flexible algorithm for dependency recovery from random walks.
MENDR
Dataset and benchmarking toolkit to compare recovery capability.
Generative Model
Generalization of Forest Pursuit inspired by probabilistic factorization & Dictionary Learning.

Future Work

  • Multiple Sources & Hidden Nodes

    • Allowing for multi-source random walks by explicitly including the “root” R
    • Allow for adding unseen nodes e.g. with addition of steiner nodes a la [27]
  • Generalizing Inner Products

    • Non-euclidean metric spaces for Desire Path Densities
    • Causality-preserving Lorentz embedding [28] applied to voronoi normalizing flows [29], [30]
  • New applications & case studies

    • Bigger inventories of effects on assortativity and centrality
    • Re-assess application of semantic fluency e.g. for early-onset Alzheimer’s and Schizophrenia detection
    • Human-in-the-loop interfaces for grounded coding of relationship types between concepts

Thank You

Literature Review

Basics – Sample Observations in Feature Space

For generality, say a practitioner records their measurements as scalar values, i.e., \(x\in\mathbb{S}\in\{\mathbb{R,Z,N},\cdots\}\). Data can be represented as a “design matrix” \(X:\mathbb{S}^{m\times n}\)

  • \(n\) independent/input variable features
  • over the course of \(m\) observations

\[\begin{gathered} x=X(i,j)\qquad X : I\times J \rightarrow \mathbb{S} \\ i\in I=\{1,\cdots,m\}, \quad j\in J=\{1,\cdots,n\},\qquad I,J:\mathbb{N} \end{gathered}\]

observations
\(\mathbf{x}_i=X(i,\cdot),\quad \mathbf{x}:J\rightarrow\mathbb{S}\)
features
\(\mathbf{x}_j'=X(\cdot,j),\quad \mathbf{x}':I\rightarrow\mathbb{S}\)

Marginal sums give co-occurrences, matrix products give co-occurrences, etc.

We are interested in the feature relationships (author-author), i.e. feature/representation learning when \(\mathbb{S} \equiv \mathbb{B}\)

Basics – Incidence Structures

Unifying graphs and design matrices: edge-centric parameterizations!1

Graphs via complete incidence matrix \(B\):
\[ \begin{gathered} G = (V,E,B) \quad s.t. \quad B : E\times V \rightarrow S\\ v \in V = \{1,\cdots, n\}\quad e \in E = \left\{1,\cdots, {n\choose 2} \right\} \end{gathered} \]
Observations of graphs (as edges)
\[ \begin{gathered} R(i,e) = \min(\{B_i(e, u_n(e)),B_i(e,v_n(e))\}) \\ \quad \textrm{s.t.} \quad R:I\times E \rightarrow \mathbb{S}\\ i\in I = \{1,\cdots,m\} \qquad e \in E=\left\{1,\cdots,\omega\right\}\\ n=\tfrac{1}{2}(1+\sqrt{8\omega+1}) \end{gathered} \]

Taxonomy of Modeling Assumptions – I

Local Structure/additivity
We could rely on linear combinations of observations having assumed edge representations.
Generally called association measures.
Built on (marginal) counts \(\mathbf{s}\) via inner product (Gram matrix \(X^TX\)) e.g. co-occurrences.

Note 1: Ochiai Coefficient (Cosine)

Effectively an uncentered correlation, but for binary observations the “cosine similarity” is also called the Ochiai Coefficient between two sets \(A,B\), where binary “1” stands for an element belonging to the set.[18] \[ \frac{|A \cap B |}{\sqrt{|A||B|}}=\sqrt{p_{1\bullet}p_{\bullet 1}} \rightarrow \frac{X^TX}{\sqrt{\mathbf{s}_i\mathbf{s}_i^T}}\quad \mathbf{s}_i = \sum_i \mathbf{x}_i \]

Note 2: Hyperbolic Projection

Attempts to account for the overcounting of co-occurrences on frequently occurring nodes, vs. rarer ones.[19] \[ X^T\text{diag}(1+\mathbf{s}_j)^{-1}X \quad \mathbf{s}_j = \sum_j{\mathbf{x}_j'}\] This re-weights each observation by its degree in the original bipartite graph.

See also: Yule’s Q/Y, Mutual Information, odds ratios, etc.

Taxonomy of Modeling Assumptions – II

Global Structure Constraints
We could assume the whole network belongs to a class of graphs.
e.g. Tree, block, SBM, etc.; common for MRF inference.

Note 3: Chow-Liu Spanning Tree

Enforces tree structure globally. Approximates a joint probability \[ P\left(\bigcap_{i=1}^n v_i\right) \approx P' = \prod_{e\in T} P(u_n(e)|v_n(e)) \quad T\in \mathcal{T} \] where \(\mathcal{T}\) is the set of spanning trees for the nodes. The Chow-Liu tree minimizes the Kullback-Leibler (KL) Divergence \(\text{KL}(P \| P')\) by finding the minimum spanning tree over pairwise mutual information weights.[32]

Note 4: GLASSO

Semidefinite program to find (regularized) maximum likelihood precision of graph-structured multivariate Normal distribution using \(\ell_1\) (“LASSO”) penalty [16]

\[ \operatorname*{argmax} \mathcal{L}(\Theta|\hat{\Sigma}) = \operatorname*{argmin}_{\Theta \prec 0}\ \text{tr}(\hat{\Sigma} \Theta) - \log |\Theta|+ \rho\|\Theta\|_1 \qquad(2)\]

In the binary case Equation 2 is still guaranteed to find the structure of the generating MRF, but only for graphs with singleton separators, as shown in [17].

Taxonomy of Modeling Assumptions – III

Resource and Information Flow
We limit total network activation to a pool of shared “stuff” that gets distributed somehow.
Related to partial pooling in hierarchical bayes, sitting between unpooled and fully pooled priors.

Note 5: Resource Allocation

Goes one step further than hyperbolic projection, by viewing each node as having some “amount” of a resource to spend, which gets re-allocated by observational unit. [23] \[ \text{diag}(\mathbf{s}_i)^{-1}X^T\text{diag}(\mathbf{s}_j)^{-1}X \] i.e. two-step random-walk normalization of the bipartite adjacency matrix.
First \(X\) is row-normalized, then column-normalized.

Note 6: Doubly Stochastic

If \(A\in \mathbb{S}^{n\times n}\) is positive, then there exists \(d_1,d_2\) such that \[W=\text{diag}(d_1)A\text{diag}(d_2)\] is doubly-stochastic, and \(W(u,v)\) is the optimal transport plan between \(u\) and \(v\) with regularized Euclidean distance between them on a graph.[21], [33]

The doubly-stochastic filter [20] removes edges from \(W\) until just before the graph would become disconnected.

Taxonomy of Modeling Assumptions – III

Resource and Information Flow
We limit total network activation to a pool of shared “stuff” that gets distributed somehow.
Related to partial pooling in hierarchical bayes, sitting between unpooled and fully pooled priors.

Note 7: High-Salience Skeleton

Count the number of shortest-path trees an edge participates in, out of all the shortest-path-trees (one for every node). \[ \frac{1}{n}\sum_{i=1}^n \mathcal{T}_{\text{sp}}(i) \] where \(\mathcal{T}_{\text{sp}}(n)\) is the shortest-path tree rooted at \(n\) [22]

Desire Path Math

So, each \(\mathbf{x}_i\) will induce a subgraph \(g_i = G^*[V_i]\), where \(V_i = \{v\in \mathcal{V} | X(i,\mathcal{V})=1\}\).

  • domain knowledge takes the form of a constraint on edges within that subgraph, which will induce a family of subgraphs on \(g_i\).
  • This family \(\mathcal{C}_i\) belongs to a relevant class of graphs \(\mathcal{C}\), limited to nodes \(V_i\), i.e.,

\[ \begin{gathered} \mathcal{C}_i = \{(V,E,B_i) \in\mathcal{C}|B_i(e,v)=B^*(e,v)\mathbf{1}_{V_i}(v)\mathbf{1}_{E_i}(e)\}\\ E_i\in\{\mathcal{E}\in\mathcal{P}(E)| g_i[\mathcal{E}]\in\mathcal{C}\} V_i = \{v\in \mathcal{V} | X(i,\mathcal{V})=1\} \end{gathered} \]

In other words, edges that “caused” to the node activations in a given observation

  1. must together belong to a graph that, in turn,
  2. belongs to our desired class, which must occur on the nodes that were activated.

Matrix-Forest Theorem

The regularized Laplacian \(Q=(I+\beta L)^{-1}\)

  • Is always PD (provably on Birkhoff Polytope of doubly-stochastic matrices)
  • for various \(\beta\):
    • \(\beta=1\), \(Q(j,j')\) encodes the fraction of spanning forests in which \(j,j'\) share a tree (i.e. visited by the same walk i.e. co-occurred)
    • \(\beta\rightarrow 0\) encodes shortest-path kernel
    • \(\beta\rightarrow \infty\) encodes commute-time kernel (i.e. \(L^+\))
  • \(Q(j,j)\) encodes an “isolation” measure (probability of not sharing a cascade)

IDEA: our data is measuring \(\lim_{n\rightarrow \infty} X^TX \propto Q\), s.t. each \(\mathbf{x}\) comes from a tree

Forest Pursuit – Algorithm

\begin{algorithm} \caption{Forest Pursuit} \begin{algorithmic} \Require $X\in \mathbb{B}^{m\times n}, d_K\in \mathbb{R}_{\geq 0}^{n\times n}, 0<\alpha<1$ \Ensure $R \in \mathbb{B}^{m \times {n\choose 2}}$ \Procedure{ForestPursuitEdgeProb}{$X, d_K, \alpha$} \State $R \gets $\Call{ForestPursuit}{$X, d_K$} \State $\hat{\alpha}_m\gets$\Call{DesirePathBeta}{$X,R, \alpha$} \State \textbf{return} $\hat{\alpha}_m$ \EndProcedure \Procedure{ForestPursuit}{$X, d_K$} \State $R(\cdot,\cdot) \gets 0$ \For{$i\gets 1, m$}\Comment{\textit{each observation}} \State $\mathbf{x}_i \gets X(i,\cdot)$ \State $R(i,\cdot) \gets $\Call{PursueTree}{$\mathbf{x}_i, d_K$} \EndFor \State \textbf{return} $R$ \EndProcedure \Procedure{PursueTree}{$\mathbf{x}, d$} \Comment{\textit{Approximate Steiner Tree}} \State $V \gets \{v\in \mathcal{V} | \mathbf{x}(\mathcal{V})=1\}$ \Comment{\textit{activated nodes}} \State $T \gets$\Call{MST}{$d[V,V]$} \Comment{\textit{e.g., Prim's Algorithm}} \State $\mathbf{u,v}\gets \{(j,j')\in J\times J | T(j,j')\neq 0\}$ \State $\mathbf{r} \gets e_n(\mathbf{u,v})$ \Comment{\textit{unroll tree adjacency}} \State \textbf{return} $\mathbf{r}$ \EndProcedure \Procedure{DesirePathBeta}{$X,R, \alpha$} \State $\mathbf{s} \gets \sum_{i=1}^m R(i,e)$ \State $\sigma \gets \sum_{i=1}^m X(i,j)$ \State $\mathbf{k} \gets e_n(\sigma\sigma^T)$ \Comment{\textit{co-occurrence counts}} \State $\hat{\alpha}_m \gets \alpha + \frac{\mathbf{s}-\alpha \mathbf{k}}{\mathbf{k}+1}$ \State \textbf{return} $\hat{\alpha}_m$ \EndProcedure \end{algorithmic} \end{algorithm}

Forest Pursuit Modifications & Extensions

FPi (Interaction Probability)

Let’s fix our APS performance!

  • GLASSO is estimating edge strength, not “existence”
  • We can too: probability of edge traversal (i.e. rescale by co-occurrence probability)
  • This gives “FPi”, an estimate for the rate of direct interaction

Completely closes the APS gap, while sacrificing MCC/F-M!

Table 4: Comparing median(IQR) scores against FPi
(a)
Figure 32: FPi shows improved APS, optimal MCC, and MCC (min-connect)

Figure 33: P-R curves for two experiments

Generative Model

Random spanning trees on an augmented graph are precisely the random spanning forests on the original.

Marking a single node can “root” a tree in the forest.

Figure 34: Dissemination plan as rooted RST on augmented graph

Leads to generative model for multi-output binary activations on a graph:

Generative Model – Specification

  • Generate random spanning trees on \(G_{+R}\), given the edge weights \(z_E\).
  • Uniformly sample a marked node \(\phi\).
  • Activate all nodes connected to \(\phi\) in the induced spanning forest.

\[ \begin{aligned} \pi_{e\in E} &\sim \operatorname{Beta}_E(\alpha, 1-\alpha) \\ z_{e\in E} &\sim \operatorname{Bernoulli}_E(\pi_e) \\ \phi_{i\in I, j\in J} &\sim \operatorname{Categorical}_I(\theta) \\ x_{i\in I,j\in J} &\sim \operatorname{RSFm}_K(\beta, z_e, \phi_n) \\ \end{aligned} \]

Generative model – Forests

\[ P(x_{ij}|z_E, \phi_{ij}) = \sum_{T\in\mathcal{T}_{+R}}P(x_{ij}| T,\phi_{ij}) P(T|z_E) \]

Is the same as:

\[ P(x_{ij}|z_E, \phi_{ij}) = \sum_{F\in\mathcal{F}}P(x_{ij}| F,\phi_{ij}) P(F|z_E) \]

\[ \begin{gathered} P(F|z_E) = \frac{1}{Z_\mathcal{F}} \mu(F)\\ Z_\mathcal{F} = \det{(I+\beta L)} \end{gathered} \]

Likelihood: \(\sum_\mathcal{F} P(x_{ij}|F,\phi_{ij})\) is the probability a node occurs on the same subtree as a “marked” on

Expected Forest Maximization

Alternate between estimating recovered network \(B\) and representation of data in edge-space \(R\):

  1. Forest Pursuit estimate \(\hat{B}\)
  2. Estimate \(d_Q\) via Forest Kernel \(\hat{Q} \gets (I+\beta B^TB)^{-1}\)
  3. Use \(d_Q\) for Steiner tree approximation on metric closure ()
  4. …etc.

\begin{algorithm} \caption{Expected Forest Maximization (EFM)} \begin{algorithmic} \Require $X\in \mathbb{B}^{m\times n}, d_K\in \mathbb{R}_{\geq 0}^{n\times n}, 0<\alpha<1$ \Ensure $R \in \mathbb{B}^{m \times {n\choose 2}}$ \Procedure{EFM}{$X, d_K, \alpha, \beta, \epsilon$} \State $R \gets $\Call{ForestPursuit}{$X, d_K$} \State $\hat{\alpha}_m\gets$\Call{DesirePathBeta}{$X,R, \alpha$} \While {$\|\hat{\alpha}-\alpha\|_{\infty}>\epsilon$} \State $\alpha_m \gets \hat{\alpha}_m$ \State $Q \gets (I+\beta L_{\text{sym}}(\alpha_m))^{-1}$ \State $d_K \gets d_Q$ \State $R \gets $\Call{ForestPursuit}{$X, d_K$} \State $\hat{\alpha}_m\gets$\Call{DesirePathBeta}{$X,R, \alpha_m$} \EndWhile \State \textbf{return} $\hat{\alpha}_m$ \EndProcedure \end{algorithmic} \end{algorithm}

EFM Results

  • Small but consistent improvement in reconstruction performance across MENDR dataset.

  • On a per-edge basis, (heavily CV-regularized) logistic regression study shows

    • Slightly positive odds ratio for a correct prediction.
    • Improvement is agnostic to graph type.
(a) Change in Expected MCC (EFM vs FP)
(b) Logistic Regression Coef. (EFM - FP) vs. (Ground Truth)
Figure 35

EFM Runtime

  • EFM still competitive with GLASSO, outperforming it in most cases
  • Trend appears to have a fairly consistent multiplicative factor (\(\approx10\)), at all scales
Figure 36: Runtime Scaling (Forest-Pursuit vs GLASSO)

Figure 37: Partial Residuals (regression on computation time)

Most of the runtime penalty appears to occur from network size (perhaps the \(\|\cdot\|_{\infty}\) norm?)

Metrics

Metrics – Precision and Recall

Walber, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Various goals when balancing true/false-positives and true/false-negatives:

Note 8: Precision (P)

Fraction of positive predictions that are true: \[P= \frac{TP}{TP+FP}\]

also called “positive predictive value” (PPV)

Note 9: Recall (R)

Fraction of true values that were returned as positive. \[R=\frac{TP}{TP+FN} \]

Inherent trade-off with precision.

Also called the TP-rate (TPR)

Metric – Aggregation

Each metric compares a (binary) prediction to the (binary) ground truth.

  • Class probabilities \(\rightarrow\) families of metric results
  • How can we pick one threshold in an unsupervised setting?

Let’s find the “expected value” over all thresholds!

  • Normalize all outputs to \([\epsilon,1-\epsilon]\)
  • “average” metric is what a practitioner expects if thresholding at random
  • APS is more traditional, but is high if at least one threshold is good…

Note 10: Matthews Correlation Coefficient (MCC)

Balances all of TP,TN,FP,FN.

\[\frac{TP\cdot TN - FP\cdot FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]

Preferred for class-imbalanced problems (like sparse recovery) [35]

Note 11: Fowlkes-Mallows (F-M)

Geometric mean of Precision and Recall

\[\sqrt{P\cdot R}\]

Limit of the MCC as TN approaches infinity[36]

Note 12: Average Precision Score (APS)

Expected precision over the possible recall values \[\text{APS} = \sum_{e=1}^{\omega} P(e)(R(e)-R(e-1))\] approximates the integral under the parametric P-R curve

NetSci Community

NetSci Dataset

Recreating dataset from Newman & Girvan (2004) [37]

  • Web of Science queries for all authors/papers from literature reviews on Network Science
    • Newman (2003) [38]
    • Boccaletti et al (2006) [39]
    • Additional paper by Sneppen & Newman (1997) [40] to retain connected component.
  • Initial coloring of detected communities from co-occurrence network
    • Modularity-based community detection (greedy algorithm)
    • Inset shows community (quotient) graph

How does degree distribution and assortativity (node-neighbor-degree correlation) change?

NetSci Community – Co-Occurrence

Figure 38: 134 Network scientists connected by co-authorship

NetSci Community – Chow-Liu

Figure 39: Chow-Liu tree of NetSci collaborator dependency relationships

NetSci Community – Forest Pursuit

Figure 40: Forest Pursuit estimate of NetSci collaborator dependency relationships

NetSci Community – Summary

  • Forest Pursuit preserves the original community structure
  • Balances co-occurrence and tree properties (less dense, no long chains)
  • Tends to reduce assortativity and avg. degree (makes sense here?)
Figure 41: Degree distributions of FP vs co-occurrence social networks

Working Memory & Partial Orders

Partially Ordered Set

Directed Acyclic Graphs

Even our original data, if observed by the social scientist, is a “poset”

Figure 42: Observations as partially-ordered sets

In that case we did not ultimately get orderings from the papers, but other domains do!

Technical Language Processing

Word order in sentences, tagging, etc.

TLP with initial-visit emitting random-walks

We want to organize tags/concepts using the way they occur in, say, Maintenance Work Orders (MWOs)

Other than Bag-of-Words (co-occurrence), common tool is order-\(n\) _Markov Language Model

  • assume the order-of-observation is the order-of-activation
  • use up to \(n\) previous activations to help predict the next.

\[ P_{MC}(t_i|T) \approx P(t_i | t_{i-1}, \cdots,t_{i-n}) \]

However, evidence points to memore recall being more like a random walk that reports initial-visits only.

Figure 43: Partial-order edge measurements with Markov assumption

{visibility=“uncounted”}

In Sexton & Fuge (2020)[41], and [42] we demonstrate the usefulness of accounting for this.
Compare label propagation on each network vs expert subsystem annotation in Excavator MWOs [43]

Figure 44: Semisupervised MWO Tag Classification with Network Label Propagation

Still, nested stochastic gradient descent is slow. Can we use Forest Pursuit in a similar case?

References

[1]
T. P. Peixoto, “Reconstructing networks with unknown and heterogeneous errors,” Phys. Rev. X, vol. 8, no. 4, p. 041011, Oct. 2018, doi: 10.1103/PhysRevX.8.041011.
[2]
L. Peel, T. P. Peixoto, and M. De Domenico, “Statistical inference links data and theory in network science,” Nature Communications, vol. 13, no. 1, Nov. 2022, doi: 10.1038/s41467-022-34267-9.
[3]
ISO5725, “Accuracy (trueness and precision) of measurement methods and results.” International Organization for Standardization Geneva, 1994.
[4]
W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of Anthropological Research, vol. 33, no. 4, pp. 452–473, Dec. 1977, doi: 10.1086/jar.33.4.3629752.
[5]
H. Whitehead and S. Dufault, “Techniques for analyzing vertebrate social structure using identified individuals: Review and recommendations,” Advances in the Study of Behavior, vol. 28, no. 28, pp. 33–74, 1999.
[6]
D. N. Fisher, M. J. Silk, and D. W. Franks, “The perceived assortativity of social networks: Methodological problems and solutions,” in Trends in social network analysis, Springer International Publishing, 2017, pp. 1–19. doi: 10.1007/978-3-319-53420-6_1.
[7]
D. J. Wang, X. Shi, D. A. McFarland, and J. Leskovec, “Measurement error in network data: A re-classification,” Social Networks, vol. 34, no. 4, pp. 396–409, Oct. 2012, doi: 10.1016/j.socnet.2012.01.003.
[8]
E. Ackelsberg, “What is the arcsine law?” 2018, Available: https://math.osu.edu/sites/math.osu.edu/files/What_is_2018_Arcsine_Law.pdf
[9]
W. Feller, An introduction to probability theory and its applications, volume 1. J. Wiley & Sons: New York, 1968.
[10]
S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, 1993, doi: 10.1109/78.258082.
[11]
A. Omanović, H. Kazan, P. Oblak, and T. Curk, “Sparse data embedding and prediction by tropical matrix factorization,” BMC Bioinformatics, vol. 22, no. 1, Feb. 2021, doi: 10.1186/s12859-021-04023-9.
[12]
L. Kou, G. Markowsky, and L. Berman, “A fast algorithm for steiner trees,” Acta Informatica, vol. 15, no. 2, pp. 141–145, 1981, doi: 10.1007/bf00288961.
[13]
P. Chebotarev and E. Shamis, “The matrix-forest theorem and measuring relations in small social groups,” arXiv, arXiv:math/0602070, Feb. 2006. doi: 10.48550/arXiv.math/0602070.
[14]
O. Knill, “Counting rooted forests in a network,” arXiv, arXiv:1307.3810, Jul. 2013. doi: 10.48550/arXiv.1307.3810.
[15]
K. Avrachenkov, P. Chebotarev, and A. Mishenin, “Semi-supervised learning with regularized laplacian,” Optimization Methods and Software, vol. 32, no. 2, pp. 222–236, Mar. 2017, doi: 10.1080/10556788.2016.1193176.
[16]
J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, Jul. 2008, doi: 10.1093/biostatistics/kxm045.
[17]
P. Loh and M. J. Wainwright, “Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses,” in Advances in neural information processing systems, Curran Associates, Inc., 2012. doi: 10.1214/13-aos1162.
[18]
S. Janson and J. Vegelius, “Measures of ecological association,” Oecologia, vol. 49, no. 3, pp. 371–376, Jul. 1981, doi: 10.1007/bf00347601.
[19]
M. E. J. Newman, “Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality,” Physical Review E, vol. 64, no. 1, p. 016132, Jun. 2001, doi: 10.1103/physreve.64.016132.
[20]
P. B. Slater, “A two-stage algorithm for extracting the multiscale backbone of complex weighted networks,” Proceedings of the National Academy of Sciences, vol. 106, no. 26, pp. E66–E66, Jun. 2009, doi: 10.1073/pnas.0904725106.
[21]
M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Proceedings of the 27th international conference on neural information processing systems - volume 2, in NIPS’13. Red Hook, NY, USA: Curran Associates Inc., 2013, pp. 2292–2300.
[22]
D. Grady, C. Thiemann, and D. Brockmann, “Robust classification of salient links in complex networks,” Nature Communications, vol. 3, no. 1, May 2012, doi: 10.1038/ncomms1847.
[23]
T. Zhou, J. Ren, M. s. Medo, and Y.-C. Zhang, “Bipartite network projection and personal recommendation,” Phys. Rev. E, vol. 76, no. 4, 4, p. 046115, Oct. 2007, doi: 10.1103/PhysRevE.76.046115.
[24]
C. van Raalte, “1. No small-talk in paradise: Why elysium fails the bechdel test, and why we should care,” in Media, margins and popular culture, 1st ed., H. Savigny, E. Thorsen, D. Jackson, and J. Alexander, Eds., London: Imprint: Palgrave Macmillan, 2015.
[25]
O. Stuhler, “The gender agency gap in fiction writing (1850 to 2010),” Proceedings of the National Academy of Sciences, vol. 121, no. 29, Jul. 2024, doi: 10.1073/pnas.2319514121.
[26]
G. A. Miller, “The magical number seven, plus or minus two: Some limits on our capacity for processing information.” Psychological Review, vol. 63, no. 2, pp. 81–97, Mar. 1956, doi: 10.1037/h0043158.
[27]
R. Sonthalia and A. C. Gilbert, “Tree! I am no tree! I am a low dimensional hyperbolic embedding,” arXiv; arXiv, arXiv:2005.03847, Oct. 2020. doi: 10.48550/arXiv.2005.03847.
[28]
J. R. Clough and T. S. Evans, “Embedding graphs in lorentzian spacetime,” PLOS ONE, vol. 12, no. 11, p. e0187301, Nov. 2017, doi: 10.1371/journal.pone.0187301.
[29]
F. Nielsen and R. Nock, “Hyperbolic voronoi diagrams made easy,” in 2010 international conference on computational science and its applications, Mar. 2010, pp. 74–80. doi: 10.1109/ICCSA.2010.37.
[30]
R. T. Q. Chen, B. Amos, and M. Nickel, “Semi-discrete normalizing flows through differentiable tessellation,” in Advances in neural information processing systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., Curran Associates, Inc., 2022, pp. 14878–14889. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/5f61939af1699c82dab00ed36c887968-Paper-Conference.pdf
[31]
M. Angeletti, J.-M. Bonny, and J. Koko, Parallel Euclidean distance matrix computation on big datasets,” 2019. Available: https://hal.science/hal-02047514
[32]
C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees,” IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 462–467, May 1968, doi: 10.1109/tit.1968.1054142.
[33]
B. Landa and X. Cheng, “Robust inference of manifold density and geometry by doubly stochastic scaling,” SIAM Journal on Mathematics of Data Science, vol. 5, no. 3, pp. 589–614, Sep. 2023, doi: 10.1137/22M1516968.
[34]
L. Torres, A. S. Blevins, D. Bassett, and T. Eliassi-Rad, “The why, how, and when of representations for complex systems,” SIAM Rev., vol. 63, no. 3, pp. 435–485, Jan. 2021, doi: 10.1137/20M1355896.
[35]
D. Chicco and G. Jurman, “A statistical comparison between matthews correlation coefficient ( MCC), prevalence threshold, and fowlkes–mallows index,” Journal of Biomedical Informatics, vol. 144, p. 104426, Aug. 2023, doi: 10.1016/j.jbi.2023.104426.
[36]
J. Crall, “The MCC approaches the geometric mean of precision and recall as true negatives approach infinity,” Apr. 2023, doi: 10.48550/ARXIV.2305.00594.
[37]
M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical Review E, vol. 69, no. 2, p. 026113, Feb. 2004, doi: 10.1103/physreve.69.026113.
[38]
M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, no. 2, pp. 167–256, Jan. 2003, doi: 10.1137/s003614450342480.
[39]
S. BOCCALETTI, V. LATORA, Y. MORENO, M. CHAVEZ, and D. HWANG, “Complex networks: Structure and dynamics,” Physics Reports, vol. 424, no. 4–5, pp. 175–308, Feb. 2006, doi: 10.1016/j.physrep.2005.10.009.
[40]
K. Sneppen and M. E. J. Newman, “Coherent noise, scale invariance and intermittency in large systems,” Physica D: Nonlinear Phenomena, vol. 110, no. 3–4, pp. 209–222, Dec. 1997, doi: 10.1016/s0167-2789(97)00128-0.
[41]
R. Sexton and M. Fuge, “Using semantic fluency models improves network reconstruction accuracy of tacit engineering knowledge,” in Volume 2A: 45th design automation conference, American Society of Mechanical Engineers, Aug. 2019. doi: 10.1115/detc2019-98429.
[42]
R. Sexton and M. Fuge, “Organizing tagged knowledge: Similarity measures and semantic fluency in structure mining,” Journal of Mechanical Design, vol. 142, no. 3, Jan. 2020, doi: 10.1115/1.4045686.
[43]
M. Hodkiewicz and M. T.-W. Ho, “Cleaning historical maintenance work order data for reliability analysis,” Journal of Quality in Maintenance Engineering, vol. 22, no. 2, pp. 146–163, 2016.