22.2 Classification methods

Classification methods aim to learn a model that can predict the class labels of new samples based on their omic profiles. Classification methods are commonly used in supervised analysis of omic data to classify samples into different categories based on their expression levels of genes, metabolites, or other features. Overall, classification methods aim to learn a model that can accurately predict the class labels of new samples based on their omic profiles, and can be useful for identifying biomarkers that are predictive of disease or treatment response.

22.2.1 Random Forests (RF)

Random forests are an ensemble of decision trees that are trained on bootstrapped samples of the data and a random subset of the features, to reduce overfitting and improve accuracy.

library(randomForest)

# Load the dataset
data <- read.csv("mydata.csv")

# Split the data into training and test sets
trainIndex <- sample(1:nrow(data), 0.7*nrow(data))
trainData <- data[trainIndex, ]
testData <- data[-trainIndex, ]

# Train the random forest classifier
rf <- randomForest(class ~ ., data=trainData, ntree=500, importance=TRUE)

# Make predictions on the test data
predictions <- predict(rf, testData)

# Evaluate the performance of the classifier
confusionMatrix(predictions, testData$class)

22.2.2 Support Vector Machines (SVM)

Support vector machines (SVMs) aim to find a hyperplane that maximally separates the samples belonging to different classes in the feature space, and can handle both linear and nonlinear relationships between the features and the response variable.

library(e1071)

# Load the dataset
data <- read.csv("mydata.csv")

# Split the data into training and test sets
trainIndex <- sample(1:nrow(data), 0.7*nrow(data))
trainData <- data[trainIndex, ]
testData <- data[-trainIndex, ]

# Train the SVM classifier
svm <- svm(class ~ ., data=trainData, kernel="radial", cost=1)

# Make predictions on the test data
predictions <- predict(svm, testData)

# Evaluate the performance of the classifier
confusionMatrix(predictions, testData$class)