Objects and Variables in R

Introduction

In R, we work with objects to store and manipulate data. Understanding how to create and work with objects is fundamental to using R effectively. This lesson will cover the basics of working with objects and variables in R, using examples relevant to medical and health sciences, including RNA sequencing analysis.

Creating Objects

# Store a gene expression value as a numeric object
# Typical RNA-seq count data might be in this range
expression_level <- 1543.7           # Numeric value representing expression

# Store a gene name as a character (text) object
gene_name <- "BRCA1"                # String value for breast cancer gene 1

# Store a boolean flag for differential expression
is_differentially_expressed <- TRUE  # Logical value (TRUE/FALSE)

# Display all our created objects with labels
print("Expression level:")           # Print label

## [1] "Expression level:"

print(expression_level)             # Print numeric value

## [1] 1543.7

print("Gene name:")                 # Print label

## [1] "Gene name:"

print(gene_name)                    # Print string value

## [1] "BRCA1"

print("Is differentially expressed?") # Print label

## [1] "Is differentially expressed?"

print(is_differentially_expressed)   # Print logical value

## [1] TRUE

Pro Tip

In RStudio, you can type Alt + - (Windows/Linux) or Option + - (Mac) to create the assignment operator <- in one keystroke!

Naming Conventions

When naming objects in R, follow these guidelines:

✅ Good Names: - Use descriptive names: read_count, expression_level - Start with letters: gene_1, sample_id - Use underscores for spaces: fold_change

❌ Avoid: - Starting with numbers: 2nd_replicate (invalid) - Using spaces: gene expression (invalid) - Using special characters: gene@1 (invalid) - Using R reserved words: if, else, function

Working with Objects

# RNA-seq example: Calculate fold change between treated and control samples
control_expression <- 100           # Control sample expression level
treated_expression <- 200          # Treated sample expression level
fold_change <- treated_expression / control_expression  # Calculate ratio
print("Fold change:")              # Print label

## [1] "Fold change:"

print(fold_change)                 # Will show 2x increase

## [1] 2

# Medical example: Calculate Body Mass Index (BMI)
weight_kg <- 70.5                  # Patient weight in kilograms
height_m <- 1.75                   # Patient height in meters
bmi <- weight_kg / (height_m^2)    # BMI formula: weight/(height^2)
print("BMI calculation:")          # Print label

## [1] "BMI calculation:"

print(bmi)                         # Will show BMI value

## [1] 23.02041

Practice Challenges

Challenge 1: Basic Calculations

# Initialize read count from sequencing data
read_count <- 1000                 # Initial number of reads
scaling_factor <- 1.5              # Factor to adjust for library size
read_count <- read_count * scaling_factor  # Apply scaling
normalized_count <- read_count / 10  # Further normalization step

# Display results of calculations
print("Final read count:")         # Show scaled count

## [1] "Final read count:"

print(read_count)

## [1] 1500

print("Normalized count:")         # Show normalized value

## [1] "Normalized count:"

print(normalized_count)

## [1] 150

Challenge 2: Temperature Conversion

# Convert patient temperature from Celsius to Fahrenheit
temp_celsius <- 37.5              # Normal body temperature in Celsius
# Formula: (°C × 9/5) + 32
temp_fahrenheit <- (temp_celsius * 9/5) + 32  # Convert to Fahrenheit

# Display the conversion result with units
print("Temperature conversion:")

## [1] "Temperature conversion:"

print(paste(temp_celsius, "°C =", temp_fahrenheit, "°F"))

## [1] "37.5 °C = 99.5 °F"

Challenge 3: Sample Information

# Create variables for sample metadata
sample_id <- "RNA_01"             # Unique sample identifier
gene_name <- "TP53"              # Target gene (tumor protein p53)
expression_value <- 2456         # Expression count for this gene

# Combine information into a readable message using paste()
message <- paste("Sample", sample_id, "shows", expression_value, 
                "counts for gene", gene_name)
print("Combined message:")        # Print formatted message

## [1] "Combined message:"

print(message)

## [1] "Sample RNA_01 shows 2456 counts for gene TP53"

Common Mistakes and How to Avoid Them

Case Sensitivity Demonstration

# R is case-sensitive - these are three different variables
expression <- 5000               # lowercase
Expression <- 6000               # Title case
EXPRESSION <- 7000              # uppercase

# Show how all three variables are distinct
print("Three different variables:")

## [1] "Three different variables:"

print(expression)               # Shows 5000

## [1] 5000

print(Expression)              # Shows 6000

## [1] 6000

print(EXPRESSION)              # Shows 7000

## [1] 7000

Object Overwriting Example

# Initial value assignment
read_depth <- 1000000          # One million reads
print("Original read depth:")

## [1] "Original read depth:"

print(read_depth)

## [1] 1e+06

# Overwriting with new value
read_depth <- 1500000          # Changed to 1.5 million reads
print("New read depth:")

## [1] "New read depth:"

print(read_depth)              # Original value is lost

## [1] 1500000

Invalid Names Example

# Examples of invalid variable names (these will cause errors)
1st_sample <- 5               # Invalid: starts with number
sample name <- "Control"      # Invalid: contains space

Tips for Success

Always use clear, descriptive names for your objects
Be consistent with your naming style (e.g., gene_name, sample_id)
Check your objects exist by typing their name
Use the ls() function to see all objects in your environment
Use rm(object_name) to remove objects you no longer need

# Display all objects in current environment
print("Objects in environment:")

## [1] "Objects in environment:"

ls()                          # List all variables

##   [1] "adult_heights"               "analysis"                   
##   [3] "analyze_expression"          "analyze_numbers"            
##   [5] "avg"                         "base_plot"                  
##   [7] "bmi"                         "calculate_average"          
##   [9] "calculate_fold_change"       "calculate_rectangle_area"   
##  [11] "calculate_sum_product"       "calculate_tip"              
##  [13] "child_heights"               "clinical_data"              
##  [15] "control"                     "control_expression"         
##  [17] "count"                       "counter"                    
##  [19] "create_message"              "create_sequence"            
##  [21] "create_student_report"       "data"                       
##  [23] "determine_grade"             "df"                         
##  [25] "df_from_csv"                 "double_it"                  
##  [27] "element"                     "ends"                       
##  [29] "experiment_data"             "expression"                 
##  [31] "Expression"                  "EXPRESSION"                 
##  [33] "expression_level"            "expression_means"           
##  [35] "expression_value"            "expression_values"          
##  [37] "fahrenheit_to_celsius"       "fibonacci"                  
##  [39] "file"                        "first_row"                  
##  [41] "fold_change"                 "fold_changes"               
##  [43] "fruit"                       "fruits"                     
##  [45] "function_name"               "gene_annotations"           
##  [47] "gene_data"                   "gene_data_info"             
##  [49] "gene_data_long"              "gene_data_na"               
##  [51] "gene_data_split"             "gene_data_united"           
##  [53] "gene_expression_df"          "gene_name"                  
##  [55] "gene_names"                  "gene_summary"               
##  [57] "grades"                      "greet_person"               
##  [59] "grouped_data"                "heatmap_data"               
##  [61] "heatmap_data_scaled"         "height_m"                   
##  [63] "heights"                     "high_expression"            
##  [65] "high_scorers"                "i"                          
##  [67] "important_genes"             "interesting_genes"          
##  [69] "is_differentially_expressed" "is_significant"             
##  [71] "j"                           "mat"                        
##  [73] "matrix_data"                 "mean_expr"                  
##  [75] "message"                     "new_gene"                   
##  [77] "normalize_expression"        "normalized_count"           
##  [79] "number_pairs"                "numbers"                    
##  [81] "p_val"                       "p_values"                   
##  [83] "read_count"                  "read_depth"                 
##  [85] "responders_over_50"          "result"                     
##  [87] "rmd_files"                   "row_1"                      
##  [89] "row_maxes"                   "rows_2_4"                   
##  [91] "sample_id"                   "scaling_factor"             
##  [93] "sd_expr"                     "selected_cols"              
##  [95] "sentence"                    "set1"                       
##  [97] "set2"                        "sig_genes"                  
##  [99] "sig_indices"                 "significant_genes"          
## [101] "single_col_df"               "square_number"              
## [103] "squares"                     "squares_loop"               
## [105] "squares_sapply"              "standard_analysis"          
## [107] "starts"                      "student_averages"           
## [109] "student_scores"              "students"                   
## [111] "subject_averages"            "subset"                     
## [113] "sum"                         "temp_celsius"               
## [115] "temp_fahrenheit"             "test_result"                
## [117] "test_scores"                 "treated_expression"         
## [119] "treatment"                   "variable_heights"           
## [121] "weight_kg"                   "word_lengths"               
## [123] "words"

# Remove a specific object from environment
rm(expression)               # Delete 'expression' variable
print("Objects after removing 'expression':")

## [1] "Objects after removing 'expression':"

ls()                         # Show updated list

##   [1] "adult_heights"               "analysis"                   
##   [3] "analyze_expression"          "analyze_numbers"            
##   [5] "avg"                         "base_plot"                  
##   [7] "bmi"                         "calculate_average"          
##   [9] "calculate_fold_change"       "calculate_rectangle_area"   
##  [11] "calculate_sum_product"       "calculate_tip"              
##  [13] "child_heights"               "clinical_data"              
##  [15] "control"                     "control_expression"         
##  [17] "count"                       "counter"                    
##  [19] "create_message"              "create_sequence"            
##  [21] "create_student_report"       "data"                       
##  [23] "determine_grade"             "df"                         
##  [25] "df_from_csv"                 "double_it"                  
##  [27] "element"                     "ends"                       
##  [29] "experiment_data"             "Expression"                 
##  [31] "EXPRESSION"                  "expression_level"           
##  [33] "expression_means"            "expression_value"           
##  [35] "expression_values"           "fahrenheit_to_celsius"      
##  [37] "fibonacci"                   "file"                       
##  [39] "first_row"                   "fold_change"                
##  [41] "fold_changes"                "fruit"                      
##  [43] "fruits"                      "function_name"              
##  [45] "gene_annotations"            "gene_data"                  
##  [47] "gene_data_info"              "gene_data_long"             
##  [49] "gene_data_na"                "gene_data_split"            
##  [51] "gene_data_united"            "gene_expression_df"         
##  [53] "gene_name"                   "gene_names"                 
##  [55] "gene_summary"                "grades"                     
##  [57] "greet_person"                "grouped_data"               
##  [59] "heatmap_data"                "heatmap_data_scaled"        
##  [61] "height_m"                    "heights"                    
##  [63] "high_expression"             "high_scorers"               
##  [65] "i"                           "important_genes"            
##  [67] "interesting_genes"           "is_differentially_expressed"
##  [69] "is_significant"              "j"                          
##  [71] "mat"                         "matrix_data"                
##  [73] "mean_expr"                   "message"                    
##  [75] "new_gene"                    "normalize_expression"       
##  [77] "normalized_count"            "number_pairs"               
##  [79] "numbers"                     "p_val"                      
##  [81] "p_values"                    "read_count"                 
##  [83] "read_depth"                  "responders_over_50"         
##  [85] "result"                      "rmd_files"                  
##  [87] "row_1"                       "row_maxes"                  
##  [89] "rows_2_4"                    "sample_id"                  
##  [91] "scaling_factor"              "sd_expr"                    
##  [93] "selected_cols"               "sentence"                   
##  [95] "set1"                        "set2"                       
##  [97] "sig_genes"                   "sig_indices"                
##  [99] "significant_genes"           "single_col_df"              
## [101] "square_number"               "squares"                    
## [103] "squares_loop"                "squares_sapply"             
## [105] "standard_analysis"           "starts"                     
## [107] "student_averages"            "student_scores"             
## [109] "students"                    "subject_averages"           
## [111] "subset"                      "sum"                        
## [113] "temp_celsius"                "temp_fahrenheit"            
## [115] "test_result"                 "test_scores"                
## [117] "treated_expression"          "treatment"                  
## [119] "variable_heights"            "weight_kg"                  
## [121] "word_lengths"                "words"

Next Steps

After mastering basic objects and variables, you can move on to: - Working with data vectors and matrices - Understanding data types and structures - Learning how to create and use functions - Working with data frames