The `ReadMethylFile`

is a function for reading DNA methylation files and use them as new data for prediction by every model. The input for this function should be either CSV or TSV file format. Please uncomment the following lines and run the function.

```
# set.seed(1234)
# fac <- ncol(Data1)
# NewData <- sample(data.frame(t(Data1[,-fac])),10)
# NewData <- cbind(rownames(NewData), NewData)
# colnames(NewData)[1] <- "ID"
# write.csv(NewData, "NewData.csv", quote = FALSE, row.names = FALSE)
# methyl <- ReadMethylFile(File = "NewData.csv")
```

This function has only one argument, the File. While the first column is CpG methylation probs, starting with cg and followed by a number, other columns are samples with methylation values. All columns should be named.

The `BoxPlot`

function draws a box plot out of the DNA methylation dataset or other data frames.

```
data <- Data2[1:20,]
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
BoxPlot(File = data, Projname = NULL)
```

This function has two arguments as follow:

`File`

A data frame with the first column as ID.`Projname`

A string to name the plot.

The `TSNEPlot`

function draws a 3D t-SNE plot for DNA methylation dataset using the K-means clustering technique. This function has two arguments `File`

(any matrices) and `NCluster`

( number of clusters for K-Means clustering).

```
data <- data.frame(t(Data2[1:100,]))
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
TSNEPlot(File = data, NCluster = 4)
```

An R window will appear with a 3D projection of the t-SNE result. The plot object can be saved with the next line of code (uncomment).

```
# rgl.snapshot('tsne3d.png', fmt = 'png')
```

Using `ReadSNFData`

function, one can read files (any matrices with CSV or TSV format) and feed them into the similarity network fusion (SNF) function (from the SNFtools package). Please uncomment the following lines and run the function.

```
# data(Data2) # Gene expression
# Data2 <- cbind(rownames(Data2), Data2)
# colnames(Data2)[1] <- "ID"
# write.csv(Data2, "Data2.csv", row.names = FALSE)
# Data2 <- ReadSNFData(File = "Data2.csv")
```

The `SimilarityNetworkFusion`

is a function to perform SNF function (from SNFtool package) and output clusters.

```
data(RLabels) # Real labels
data(Data2) # Methylation
data(Data3) # Gene expression
snf <- SimilarityNetworkFusion(Files = list(Data2, Data3),
NNeighbors = 13,
Sigma = 0.75,
NClusters = 4,
CLabels = c("Group4", "SHH", "WNT", "Group3"),
RLabels = RLabels,
Niterations = 60)
```

```
snf
#> [1] SHH Group3 Group4 Group4 Group4 SHH SHH Group3 Group4 SHH
#> [11] WNT SHH SHH WNT SHH WNT Group3 Group3 Group3 Group4
#> [21] Group4 Group3 Group3 Group3 Group4 Group4 Group4 Group3 Group3 SHH
#> [31] SHH SHH SHH SHH Group4 Group3 SHH Group4 Group4 Group3
#> [41] Group4 Group4 WNT Group3 Group4 Group4 Group4 Group4 SHH Group4
#> Levels: Group4 SHH WNT Group3
```

This function has several arguments as follow:

`Files`

A list of data frames created using the ReadSNFData function or matrices.`NNeighbors`

The number of nearest neighbors.`Sigma`

The variance for local model.`NClusters`

The number of clusters.`CLabels`

A string vector to name the clusters. Optional.`RLabels`

The actual label of samples to calculate the Normalized Mutual Information (NMI) score. Optional.`Niterations`

The number of iterations for the diffusion process.

The `SupportVectorMachineModel`

is a function to train a support vector machine model to classify medulloblastoma subgroups using the DNA methylation dataset (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Model metrics, including accuracy, precision, sensitivity F1-Score, specificity, and AUC_average can be calculated for the test dataset using the `ModelMetrics`

function, which calculates the average of the above parameters from the result of the `ConfusionMatrix`

function.

The prediction result on new data can be accessed through the `NewDataPredictionResult`

function, which calculates every predictionâ€™s mode across the number of cross-validation folds.

```
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
svm <- SupportVectorMachineModel(SplitRatio = 0.8,
CV = 10,
NCores = 1,
NewData = NewData)
ModelMetrics(Model = svm)
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.988 0.932 0.996 0.963 0.986 0.985
#> SHH 1.000 1.000 1.000 1.000 1.000 0.985
#> WNT 0.995 1.000 0.976 0.988 1.000 0.985
#> Group4 0.993 0.998 0.983 0.990 0.999 0.985
NewDataPredictionResult(Model = svm)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
```

This function has the following arguments:

`SplitRatio`

Train and test split ratio. A value greater or equal to zero and less than one.`CV`

The number of folds for cross-validation. It should be greater than one.`NCores`

The number of cores for parallel computing.`NewData`

A methylation data for prediction.

The `KNearestNeighborModel`

is a function to train a K nearest neighbor model to classify medulloblastoma subgroups using the DNA methylation dataset.

```
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
knn <- KNearestNeighborModel(SplitRatio = 0.8,
CV = 10,
K = 3,
NCores = 1,
NewData = NewData)
ModelMetrics(Model = knn)
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.993 0.981 0.973 0.977 0.996 0.985
#> SHH 0.999 1.000 0.997 0.999 1.000 0.985
#> WNT 0.999 0.997 1.000 0.999 0.999 0.985
#> Group4 0.993 0.988 0.991 0.990 0.993 0.985
NewDataPredictionResult(Model = knn)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
```

This function has the following arguments:

`SplitRatio`

Train and test split ratio. A value greater or equal to zero and less than one.`CV`

The number of folds for cross-validation. It should be greater than one.`K`

The number of nearest neighbors.`NCores`

The number of cores for parallel computing.`NewData`

A methylation data for prediction.

The `RandomForestModel`

is a function to train a random forest model to classify medulloblastoma subgroups using the DNA methylation dataset.

```
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
rf <- RandomForestModel(SplitRatio = 0.8,
CV = 10,
NTree = 100,
NCores = 1,
NewData = NewData)
ModelMetrics(Model = rf)
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.999 1.000 0.996 0.998 1.000 0.998
#> SHH 1.000 1.000 1.000 1.000 1.000 0.998
#> WNT 1.000 1.000 1.000 1.000 1.000 0.998
#> Group4 0.999 0.998 1.000 0.999 0.999 0.998
NewDataPredictionResult(Model = rf)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
```

This function has the following arguments:

`SplitRatio`

Train and test split ratio. A value greater or equal to zero and less than one.`CV`

The number of folds for cross-validation. It should be greater than one.`NTree`

The number of trees to be grown.`NCores`

The number of cores for parallel computing.`NewData`

A methylation data for prediction.

The `XGBoostModel`

is a A function to train an XGBoost model to classify medulloblastoma subgroups using the DNA methylation dataset.

```
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
xgboost <- XGBoostModel(SplitRatio = 0.8,
CV = 10,
NCores = 1,
NewData = NewData)
#> [1] train-mlogloss:0.390594
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177861
#> [3] train-mlogloss:0.087035
#> [4] train-mlogloss:0.043112
#> [5] train-mlogloss:0.022536
#> [6] train-mlogloss:0.012486
#> [7] train-mlogloss:0.007278
#> [8] train-mlogloss:0.004395
#> [9] train-mlogloss:0.002879
#> [10] train-mlogloss:0.002457
#> [1] train-mlogloss:0.388419
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177664
#> [3] train-mlogloss:0.085746
#> [4] train-mlogloss:0.043333
#> [5] train-mlogloss:0.022637
#> [6] train-mlogloss:0.012444
#> [7] train-mlogloss:0.007140
#> [8] train-mlogloss:0.004413
#> [9] train-mlogloss:0.002823
#> [10] train-mlogloss:0.002431
#> [1] train-mlogloss:0.388072
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.176992
#> [3] train-mlogloss:0.085394
#> [4] train-mlogloss:0.043119
#> [5] train-mlogloss:0.022323
#> [6] train-mlogloss:0.012245
#> [7] train-mlogloss:0.006953
#> [8] train-mlogloss:0.004304
#> [9] train-mlogloss:0.002808
#> [10] train-mlogloss:0.002544
#> [1] train-mlogloss:0.386945
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.175823
#> [3] train-mlogloss:0.085418
#> [4] train-mlogloss:0.042969
#> [5] train-mlogloss:0.022146
#> [6] train-mlogloss:0.012049
#> [7] train-mlogloss:0.006975
#> [8] train-mlogloss:0.004246
#> [9] train-mlogloss:0.002766
#> [10] train-mlogloss:0.002319
#> [1] train-mlogloss:0.387957
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177210
#> [3] train-mlogloss:0.085601
#> [4] train-mlogloss:0.043317
#> [5] train-mlogloss:0.022903
#> [6] train-mlogloss:0.012530
#> [7] train-mlogloss:0.007282
#> [8] train-mlogloss:0.004478
#> [9] train-mlogloss:0.002934
#> [10] train-mlogloss:0.002514
#> [1] train-mlogloss:0.390082
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177320
#> [3] train-mlogloss:0.085780
#> [4] train-mlogloss:0.043300
#> [5] train-mlogloss:0.022592
#> [6] train-mlogloss:0.012513
#> [7] train-mlogloss:0.007264
#> [8] train-mlogloss:0.004434
#> [9] train-mlogloss:0.002923
#> [10] train-mlogloss:0.002552
#> [1] train-mlogloss:0.391327
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.178571
#> [3] train-mlogloss:0.086584
#> [4] train-mlogloss:0.043455
#> [5] train-mlogloss:0.022545
#> [6] train-mlogloss:0.012333
#> [7] train-mlogloss:0.007139
#> [8] train-mlogloss:0.004347
#> [9] train-mlogloss:0.002808
#> [10] train-mlogloss:0.002392
#> [1] train-mlogloss:0.387270
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.176343
#> [3] train-mlogloss:0.085785
#> [4] train-mlogloss:0.042867
#> [5] train-mlogloss:0.022099
#> [6] train-mlogloss:0.011727
#> [7] train-mlogloss:0.006694
#> [8] train-mlogloss:0.003969
#> [9] train-mlogloss:0.002619
#> [10] train-mlogloss:0.002393
#> [1] train-mlogloss:0.385785
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.175053
#> [3] train-mlogloss:0.084670
#> [4] train-mlogloss:0.042764
#> [5] train-mlogloss:0.022229
#> [6] train-mlogloss:0.011936
#> [7] train-mlogloss:0.006921
#> [8] train-mlogloss:0.004281
#> [9] train-mlogloss:0.002846
#> [10] train-mlogloss:0.002425
#> [1] train-mlogloss:0.388743
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.176686
#> [3] train-mlogloss:0.086097
#> [4] train-mlogloss:0.043017
#> [5] train-mlogloss:0.022624
#> [6] train-mlogloss:0.012143
#> [7] train-mlogloss:0.007023
#> [8] train-mlogloss:0.004200
#> [9] train-mlogloss:0.002729
#> [10] train-mlogloss:0.002506
ModelMetrics(Model = xgboost)
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.979 0.896 0.981 0.936 0.978 0.968
#> Group4 0.993 0.990 0.991 0.991 0.994 0.968
#> SHH 0.998 1.000 0.990 0.995 1.000 0.968
#> WNT 0.983 0.992 0.934 0.962 0.998 0.968
NewDataPredictionResult(Model = xgboost)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
```

This function has the following arguments:

`SplitRatio`

Train and test split ratio. A value greater or equal to zero and less than one.`CV`

The number of folds for cross-validation. It should be greater than one.`NCores`

The number of cores for parallel computing.`NewData`

A methylation data for prediction.

The `LinearDiscriminantAnalysisModel`

is a function to train a linear discriminant analysis model to classify medulloblastoma subgroups using the DNA methylation dataset.

```
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
lda <- LinearDiscriminantAnalysisModel(SplitRatio = 0.8,
CV = 10,
NCores = 1,
NewData = NewData)
ModelMetrics(Model = lda)
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.941 0.778 0.889 0.828 0.951 0.91
#> SHH 0.994 0.991 0.985 0.988 0.997 0.91
#> WNT 0.993 0.986 0.981 0.984 0.996 0.91
#> Group4 0.945 0.949 0.893 0.920 0.973 0.91
NewDataPredictionResult(Model = lda)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
```

This function has the following arguments:

`SplitRatio`

Train and test split ratio. A value greater or equal to zero and less than one.`CV`

The number of folds for cross-validation. It should be greater than one.`NCores`

The number of cores for parallel computing.`NewData`

A methylation data for prediction.

The `NaiveBayesModel`

is a function to train a Naive Bayes model to classify medulloblastoma subgroups using the DNA methylation dataset.

```
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
nb <- NaiveBayesModel(SplitRatio = 0.8,
CV = 10,
Threshold = 0.8,
NCores = 1,
NewData = NewData)
ModelMetrics(Model = nb)
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.974 0.859 1.000 0.924 0.969 0.971
#> SHH 1.000 1.000 1.000 1.000 1.000 0.971
#> WNT 0.984 1.000 0.928 0.963 1.000 0.971
#> Group4 0.990 1.000 0.972 0.986 1.000 0.971
NewDataPredictionResult(Model = nb)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
```

This function has the following arguments:

`SplitRatio`

Train and test split ratio. A value greater or equal to zero and less than one.`CV`

The number of folds for cross-validation. It should be greater than one.`Threshold`

The threshold for deciding class probability. A value greater or equal to zero and less than one.`NCores`

The number of cores for parallel computing.`NewData`

A methylation data for prediction.

The `NeuralNetworkModel`

is a function to train an artificial neural network model to classify medulloblastoma subgroups using the DNA methylation dataset. Please uncomment the following lines and run the function. If it is the first time you run this function, set the InstallTensorFlow parameter to TRUE. It will automatically install the Python and TensorFlow library (version 2.10-cpu) in a virtual environment. Then set the parameter to FALSE.

```
# set.seed(1234)
# fac <- ncol(Data1)
# NewData <- sample(data.frame(t(Data1[,-fac])),10)
# NewData <- cbind(rownames(NewData), NewData)
# colnames(NewData)[1] <- "ID"
# ann <- NeuralNetworkModel(Epochs = 100,
# NewData = NewData,
# InstallTensorFlow = TRUE)
# ModelMetrics(Model = ann)
# NewDataPredictionResult(Model = ann)
```

This function has the following arguments:

`Epochs`

The number of epochs.`NewData`

A methylation data from ReadMethylFile function.`InstallTensorFlow`

Logical. Running this function for the first time, you need to install TensorFlow library (V 2.10-cpu). Default is TRUE.