Vignette taken directly from Christensen & Kenett (under review)

With the semantic networks estimated for the low and high openness to experience group, they can now be analyzed and statistically compared. The third R package in our SemNA pipeline is the SemNeT package. SemNeT handles the computation of network measures and statistical tests. The network measures are used to quantify the structure and properties of the semantic networks, while the statistical tests are used to compare the differences between these measures.

While methods and tools to measure different networks is an active and thriving area (Epskamp, Borsboom, & Fried, 2018; Moreno & Neville, 2013; van Borkulo et al., 2017), a standard network analysis includes visualizing the network, measuring its properties, and statistically examining the properties. These analyses are conducted in the third stage of our SemNA pipeline.

Visualization of Semantic Networks

Visualizations of networks are qualitative; however, they provide intuitive depictions of each network’s structure. There are several R packages (e.g., igraph, qgraph, and networkD3; Allaire, Gandrud, Russell, & Yetman, 2017) and other software (e.g., Cytoscape; Shannon et al., 2003) that are capable of visualizing networks. For our example, we’ll show how to use the compare_nets function in SemNeT to visually compare our two networks.

compare_nets allows the user to visually compare any number of networks. The networks are plotted side-by-side using the R packages qgraph and networktools (Jones, 2019). An important consideration when plotting the networks is the layout. The layout affects how the nodes are positioned with respect to one another on a 2D layout. Some layouts are useful for visualizing the networks only, while other layouts may have more meaningful representations of the distances between nodes (Jones, Mair, & McNally, 2018).

In general, nodes in the network are depicted with circles and their labels denote what each node represents. In our example, each label denotes a unique animal response given by the sample (see Figure 6). The edges in the network are depicted by lines, which may vary in size, representing the magnitude of association, and color (e.g., green and red), representing the sign of the association (e.g., positive or negative, respectively). Many visualization algorithms will use the magnitude of association to depict the distance between nodes in the network.

A common visualization algorithm in the psychological literature is the Fruchterman-Reingold algorithm (as known as a force-directed algorithm; Fruchterman & Reingold, 1991). This algorithm uses the magnitude of associations to map the distances between nodes. To visualize the networks in our example using this algorithm, the following code should be run:

# Visually compare networks
compare_nets(net.low, net.high,
             title = list("Low Openness", "High Openness"),
             config = "spring", weighted = FALSE)
Comparison of low (left) and high (right) openness to experience semantic networks based on the Fruchterman-Reingold algorithm

Comparison of low (left) and high (right) openness to experience semantic networks based on the Fruchterman-Reingold algorithm

The compare_nets visualization function depicts the networks side-by-side (Figure 6). If there are more than two networks, then compare_nets will automatically optimize the plot layout for comparison. There is an option to add titles using the title argument, which must be input as a list object. If no titles are input, then the name(s) of the object(s) will be used (e.g., net.low and net.high). The config argument can be used to select the layout. This defaults to "spring", which is the Fruchterman-Reingold layout. Other layouts that are applied by the qgraph package can also be used (see ?qgraph). Finally, the weighted argument can be used to visualize the networks as weighted (TRUE) or unweighted (FALSE). In our example, we specified the networks to be unweighted (weighted = FALSE). Because the networks are unweighted, the lines are black (rather than green and red based on sign) and the thickness of the edges are all the same (rather than size differences based on magnitude of association).

Based on Figure 6, it appears that the high openness to experience group has a more tightly clustered and interconnected network; however, as mentioned before, these visualizations are generally qualitative and not meant for interpretation. Therefore, regardless of whether these depictions appear to have meaningful differences, quantitative analysis is necessary to determine whether there are truly differences.

Global Network Measures

Common SemNA metrics are global (or macroscopic) network measures. These measures focus on the structure of the entire network, emphasizing how the nodes are connected as a cohesive whole (Siew et al., 2019). In contrast, local (or microscopic) network measures focus on the role of individual nodes in the network (Bringmann et al., 2019). Below we briefly define and describe a few global network measures—clustering coefficient, average shortest path length, and modularity—that are commonly used in SemNA. A more extensive review of these and other measures can be found in Siew et al. (2019).

A common network structure that tends to emerge across many different systems (including semantic memory) is a small world structure (Watts & Strogatz, 1998). Small world networks are characterized by having a high clustering coefficient (CC) and moderate average shortest path length (ASPL). The CC refers to the extent that two neighbors of a node will be neighbors themselves, on average, in the network. A network with a higher CC, for example, suggests that nodes that are near-neighbors to each other tend to co-occur and be connected. Wulff, Hills, & Mata (2018) examined younger and older adults semantic networks that were estimated using verbal fluency data and found that the older adults had a smaller CC compared to the younger adults. This structure was interpreted as having a role in the cognitive slowing observed in older adults.

The ASPL refers to the average shortest number of steps (i.e., edges) that is needed to traverse between any pair of nodes in the network. That is, it’s the average of the minimum number of steps from one node to all other nodes. In cognitive models, ASPL may affect the activation of associations between concepts (known as spreading activation; Anderson, 1983; Siew, 2019) such that a lower ASPL would increase the likelihood of reaching a greater number associations. Several studies of creative ability and semantic networks have linked lower ASPL to greater creative ability (Benedek et al., 2017; Kenett et al., 2016a; Kenett & Faust, 2019). These studies have argued that the lower ASPL in the semantic network of people with higher creative ability (compared to less creative people) may have allowed them to reach more remote associations, which in turn could be combined into novel and useful associations (Kenett & Faust, 2019).

Finally, modularity measures how a network compartmentalizes (or partitions) into sub-networks (i.e., smaller networks within the overall network; Fortunato, 2010; Newman, 2006). The modularity statistic (Q) measures the extent to which the network has dense connections between nodes within a sub-network and sparse (or few) connections between nodes in different sub-networks. Larger Q values suggest that these sub-networks are more well-defined than lower Q values, suggesting that the network can more readily be segmented into different parts. A few studies have demonstrated the significance of modularity in cognitive networks (Kenett et al., 2016b; Siew, 2013). For example, Kenett et al. (2016b) found that people with high functioning autism (Asperger Syndrome) had a more modular semantic network relative to matched controls. They suggest that this “hyper” modularity might be related to rigidity of thought that often characterizes people with Asperger Syndrome.

These metrics represent the basic measures of semantic networks. To compute these measures, the following code can be used:

# Compute network measures
semnetmeas(net.low, meas = c("ASPL", "CC", "Q"), weighted = FALSE)
semnetmeas(net.high, meas = c("ASPL", "CC", "Q"), weighted = FALSE)
Table 7. Group Network Measures
Group ASPL CC Q
Low 3.25 0.74 0.64
High 2.78 0.76 0.59

semnetmeas only computes global network measures. Other measures, such as centrality, can be computed using the NetworkToolbox package. To obtain only one measure, the argument meas can be set to the desired measure (e.g., meas = "ASPL" for only ASPL). Moreover, to obtain weighted measures, the argument weighted can be set to TRUE (defaults to FALSE). Based on the global network measures, in our example, it appears that the high openness to experience group has a higher CC and lower ASPL and Q than the low openness to experience group (Table 7).

Statistical Tests

In order to statistically test for differences in these measures across networks, other approaches must be used. Here, we present two approaches in the SemNeT package that statistically test for differences between networks: tests against random networks and bootstrapped partial networks (e.g., Kenett et al., 2013, 2016a).

Tests against random networks

Tests against random networks can be performed to determine whether the network measures observed in the groups are different from what would be expected from a random network with the same number of nodes and edges (Beckage, Smith, & Hills, 2011; Steyvers & Tenenbaum, 2005). This approach iteratively simulates Erds-Rényi random networks with the same number of nodes and edges with a fixed edge probability (Boccaletti, Latora, Moreno, Chavez, & Hwang, 2006). For each simulated random network, global network measures (i.e., CC, ASPL, and Q) are computed, resulting in a sampling distribution of these measures for the random network.

The Erds-Rényi random network model does not make any assumptions regarding the structure of the network, which makes it a useful null model to test against (Erdős & Rényi, 1960). That is, whether the network structure for a specific network measure could be generated from a random network model. The following code implements the tests against random networks:

# Compute tests against random networks
rand.test <- randnet.test(net.low, net.high, iter = 1000, cores = 4)

This function accepts any number of networks as inputs, allowing for multiple networks to be tested. For our example, we only input two networks: net.low and net.high. The other arguments, iter and cores, specify the number of simulated random networks and processing cores that should be used in the computation, respectively. The output from this function is a table that reports the p-values for each network compared to the random network values and the values below “Random” are the mean (M) and standard deviation (SD) of the global network measures for the random network distribution.

Table 8. p-values of Low and High Openness to Experience Networks Against Random Networks
Group Measures p-values Random (M) Random (SD)
ASPL < .001 3.04 0.03
Low CC < .001 0.04 0.01
Q < .001 0.38 0.01
ASPL < .001 3.03 0.03
High CC < .001 0.04 0.01
Q < .001 0.38 0.01

randnet.test will compute tests against random networks for each network input into the function. In our example, there were two networks, so the result computed random networks based on both network structures. As shown in Table 8, all global network measures were significantly different from random for both openness to experience groups. This suggests that both networks have significantly different structures than a random network with the same number of nodes and edges.

Bootstrapped partial networks

The second test to statistically compare semantic networks applies a bootstrap method (Efron, 1979). This approach selects a subset of nodes in the network (e.g., 50%), estimates all compared networks for this subset of nodes, and then computes the network measures (Kenett, Anaki, & Faust, 2014). This method is known as without replacement because each node can only be selected once (Bertail, 1997; Politis & Romano, 1994; Shao, 2003). With these nodes removed from the data, partial networks are estimated from each group’s data. The network measures are then computed for these partial networks. This process repeats iteratively, usually 1,000 times.

Thus, this process estimates and compares partial networks from the full networks. The rationale for this approach is twofold: (1) if the full networks differ from each other, then any partial network consisting of the same nodes should also be different, and thus (2) the generation of many partial networks allows for a direct statistical comparison between the full networks (Kenett et al., 2016b).

Similar to the test against random networks, these iterated partial networks form a sampling distribution of the global network measures for both groups, but solely based on the empirical data. These sampling distributions can then be statistically compared using a t-test (or ANOVA) to determine whether the global network measures are different between the compared networks. In a recent implementation, we applied this approach by retaining a gradation of nodes (50%, 60%, 70%, 80%, and 90%; Christensen, Kenett, Cotter, Beaty, & Silvia, 2018). This implementation allows for trends in the distributions to be revealed. To perform this analysis, the function partboot can be used.

#Arguments for 'partboot' function
bootSemNeT(..., method = c("CN", "NRW", "PF", "TMFG"),
         type = c("case", "node"), prop, sim,
         weighted = FALSE, iter = 1000, cores)

The first argument, ..., is for the input of the equated group binary response matrices (e.g., equate.low and equate.high). The ... argument is used across all bootSemNeT associated functions to allow for the plotting and statistical testing of any number of groups involved in the analysis. percent refers to the number of nodes that should remain in the network (e.g., prop = .50). The next argument, sim, selects the similarity measure (defaults to "cosine"). Next, the weighted argument decides whether weighted or unweighted network measures should be computed (defaults to FALSE or unweighted measures). iter is the number of bootstrap iterations to be completed (defaults to 1000). Finally, cores refers to the number of processor cores to use in the parallelization of the analysis (defaults to the computer’s maximum number of cores minus one). Below is code to run the analysis for retaining 50%, 60%, 70%, 80%, and 90% of nodes from the groups, respectively:

# Compute partial bootstrap network analysis
## Set seed for reproducibility
set.seed(42)

## 50% of nodes remaining in network
boot.fifty <- partboot(equate.low, equate.high,
                       method = "TMFG", type = "node",
                       prop = .50, iter = 1000,
                       sim = "cosine", cores = 4)
## 60% of nodes remaining in network
boot.sixty <- partboot(equate.low, equate.high,
                       method = "TMFG", type = "node",
                       prop = .60, iter = 1000,
                       sim = "cosine", cores = 4)
## 70% of nodes remaining in network
boot.seventy <- partboot(equate.low, equate.high,
                         method = "TMFG", type = "node",
                         prop = .70,, iter = 1000,
                         sim = "cosine", cores = 4)
## 80% of nodes remaining in network
boot.eighty <- partboot(equate.low, equate.high,
                        method = "TMFG", type = "node",
                        prop = .80, iter = 1000,
                        sim = "cosine", cores = 4)
## 90% of nodes remaining in network
boot.ninety <- partboot(equate.low, equate.high,
                        method = "TMFG", type = "node",
                        prop = .90, iter = 1000,
                        sim = "cosine", cores = 4)

For more than two groups, the additional equated binary responses matrices can be added as input. partboot will perform the same analyses—selecting the same subset of nodes—for all binary response matrices that are input. The computation time for each implementation increases with the number of nodes that remain in the network. Overall, the computation is relatively fast, with computing times for each implementation ranging from 5 to 10 minutes (around 40 minutes in total; defined for a computer with the specifications: Intel Core i5-4300M 2.60GHz and 16GB DDR3 RAM). Once the partial bootstrap network measures are computed, they can then be visualized using the plot function:

# Plot bootstrap results
plots <- plot(boot.fifty, boot.sixty, boot.seventy,
              boot.eighty, boot.ninety, groups = c("Low","High"),
              measures = c("ASPL", "CC", "Q"))

The function accepts as many bootstrap samples as the user inputs. There are two arguments that the user can define: groups and measures. The groups argument specifies the names of the groups for the plot (must be a character vector). If no group names are entered, then the default is to name the groups after the name of the objects used for the input in partboot (i.e., equate.low and equate.high). The measures argument allows the user to choose which global network measures are plotted (defaults to all measures). Below are the results of our example from the data collected by Christensen et al. (2018):