3’-end post-transcriptional modifications exploration and differential proportion analysis

Complete source of this page is available in the notebook Post-transcriptional_modifications.Rmd.

Based on the publication from Xu X, Usher B, Gutierrez C, Barriot R, Arrowsmith TJ, Han X, Redder P, Neyrolles O, Blower TR, Genevaux P (2023) MenT nucleotidyltransferase toxins extend tRNA acceptor stems and can be inhibited by asymmetrical antitoxin binding. Nature Communications, 14, 4644. https://doi.org/10.1038/s41467-023-40264-3.

Download dataset

Data accompanying the article are publicly available on European Nucleotide Archive (ENA) at EBI https://www.ebi.ac.uk/ena/browser/view/PRJEB62085

For this notebook, we will use some RNA-Seq samples multiplexed into one raw FASTQ file to reproduce some results on the action of toxin MenT1 which adds ribonucleotides at the 3’-end of tRNAs as illustrated in figure 3E of the original paper.

The dataset will be directly downloaded in the ../extdata/MenT1 directory of the project (that should be readily available if the GitLab project is cloned).

The initial ../extdata/MenT1 directory contains also the reference sequences of the tRNAs (multifasta file) to which we will align the 3’-end sequencing reads: Msm.reference.tRNAs.with.CCA.fasta.

library(rnaends)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
options(width = 250)

data_dir <- "../extdata/MenT1"

raw_fastq <- paste0(data_dir, "/Msm.total.RNA.MenT1.Library.fq.gz")

if (!file.exists(raw_fastq))
    download.file("ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR114/ERR11439143/Msm.total.RNA.MenT1.Library.fq.gz",
                  destfile = raw_fastq)

Pre-processing of reads prior mapping

In this section, we will parse the raw reads to:

  • validate their content

  • split the FASTQ file into 4 different FASTQ files, each one of them corresponding to a sample: control without toxin or tRNAs with toxin and replicate 1 or 2

For more information on the other pre-processing features available, see the vignette on pre-processing features.

The reads are 100 nt long and their structure can be schematized as follows:

read position: 1   2                      6                      25    40                 46                                   
               N · multiplexing barcode · recognition sequence · UMI · control sequence · 3'-5' tRNA-end reverse complement    
               │   │                      │                       │       │               └ 55nt long                          
               │   │                      │                       │       └ must be AGATAC                                     
               │   │                      │                       └ 15 random nucleotides (A, T, C or G) serving as            
               │   │                      │                         Unique Molecule Identifier (UMI) for PCR duplicates removal
               │   │                      └ must be CGGCACCAACCGAGGCTCA                                                        
               │   └ must be one of ACAC, ATTG, GACG or GGTA                                                                   
               │                                                                                                               
               └ one random nucleotide at the beginning                                                                        

Here, we define the authorized barcodes and their associated experimental condition (in same order) so that we can reuse them later.

barcodes   <- c("ACAC",    "ATTG",    "GACG",     "GGTA")
conditions <- c("ctrl.r1", "ctrl.r2", "MenT1.r1", "MenT1.r2")

The above schematized structure is translated to a read_features table that will be used for reads parsing and validation.

For its initialization, the user must specify which region on the reads contains the sequence destined to be aligned on the reference sequences.

read_features <- init_read_features(start = 46, width = 55) # region to map
read_features %>%
  knitr::kable()

name

start

width

pattern_type

pattern

max_mismatch

readid_prepend

readseq

46

55

nucleotides

ACTG

0

FALSE

Then, the table is updated with each structure element of the reads.

The barcode from position 2 to 5 is a feature of type “constant” which is a set of authorized constant string at a certain location:

read_features <- add_read_feature(read_features,
                                  name = "barcode",
                                  start = 2,
                                  width = 4,
                                  pattern = barcodes,
                                  pattern_type = "constant")

The UMI from position 25 to 39 are composed of A, T, C or G, thus is a feature of type “nucleotides” which is a string composed of a certain alphabet. This feature will be extracted and stored in the read identifier for future use in the quantification step (to remove PCR duplicates), thus the readid_prepend = TRUE argument).

read_features <- add_read_feature(read_features,
                                  name = "UMI",
                                  start = 25,
                                  width = 15,
                                  pattern = "ACGT",
                                  pattern_type =  "nucleotides",
                                  readid_prepend = TRUE)

Then, the recognition sequence and the control sequence correspond to constant character strings and will be used to validate the reads. These are features of type “constant”.

read_features <- add_read_feature(read_features,
                                  name = "recognition_seq",
                                  start = 6,
                                  width = 19,
                                  pattern = "CGGCACCAACCGAGGCTCA",
                                  pattern_type = "constant",
                                  max_mismatch = 1,
                                  readid_prepend = FALSE)
read_features <- add_read_feature(read_features,
                                  name = "control_seq",
                                  start = 40,
                                  width = 6,
                                  pattern = "AGATAC",
                                  pattern_type = "constant",
                                  readid_prepend = FALSE)
read_features %>%
  knitr::kable()

name

start

width

pattern_type

pattern

max_mismatch

readid_prepend

readseq

46

55

nucleotides

ACTG

0

FALSE

barcode

2

4

constant

ACAC, AT….

0

FALSE

UMI

25

15

nucleotides

ACGT

0

TRUE

recognition_seq

6

19

constant

CGGCACCA….

1

FALSE

control_seq

40

6

constant

AGATAC

0

FALSE

Once the read_featurestable is defined, parsing of read can be performed. As we need also to demultiplex the samples here, we will directly calldemultiplexwhich will call theparse_read` internally to parse and validate the reads.

Demultiplex original raw FASTQ file(s) into separate FASTQ files

Reads are validated against the read features and only valid reads are found in the demultiplexed FASTQ files.

demux_report <- demultiplex(fastq_file = raw_fastq,
                            yieldsize = 1e6, # number of reads parsed at each iteration
                            features = read_features,
                            force = FALSE, # demultiplex will not do anything if there already exists a report
                            keep_invalid_reads = FALSE) # FALSE: we do not want invalid reads to be returned in a separate FASTQ file
#> Demux report already exists set force=T to overwrite: ../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/Msm.total.RNA.MenT1.Library_parse_report.csv
demux_report %>%
  knitr::kable()

barcode_group

is_valid_readseq

is_valid_barcode

is_valid_UMI

is_valid_recognition_seq

is_valid_control_seq

count

demux_filename

ACAC

FALSE

TRUE

TRUE

TRUE

TRUE

39

NA

ACAC

TRUE

TRUE

TRUE

TRUE

TRUE

996140

../data/MenT1/Msm.total.RNA.MenT1.Library_demux/Msm.total.RNA.MenT1.Library_ACAC_valid.fastq.gz

ATTG

FALSE

TRUE

TRUE

TRUE

TRUE

35

NA

ATTG

TRUE

TRUE

TRUE

TRUE

TRUE

761196

../data/MenT1/Msm.total.RNA.MenT1.Library_demux/Msm.total.RNA.MenT1.Library_ATTG_valid.fastq.gz

GACG

FALSE

TRUE

TRUE

TRUE

TRUE

25

NA

GACG

TRUE

TRUE

TRUE

TRUE

TRUE

614477

../data/MenT1/Msm.total.RNA.MenT1.Library_demux/Msm.total.RNA.MenT1.Library_GACG_valid.fastq.gz

GGTA

FALSE

TRUE

TRUE

TRUE

TRUE

39

NA

GGTA

TRUE

TRUE

TRUE

TRUE

TRUE

1059530

../data/MenT1/Msm.total.RNA.MenT1.Library_demux/Msm.total.RNA.MenT1.Library_GGTA_valid.fastq.gz

The report provides statistics on reads for each sample together with the fastq output file.

Mapping of reads and quantification of 3’-ends abundances in a count table

As, reads should align partially, we will use subread.

# input
demux_dir <- paste0(sub("(.fastq|.fq)(.gz)?$", "", raw_fastq), "_demux")
fastq_files <- demux_report$demux_filename %>% 
  na.omit()

# params for mapping
reference_genome <- "Msm.reference.tRNAs.with.CCA.fasta"
reference_fasta <- paste0(data_dir, "/", reference_genome)
reference_index <- paste0(data_dir, "/", reference_genome, ".subread")
aligner <- "subread"
max_mismatch <- 0
bam_dir <- paste0(demux_dir,
                  "/mm",
                  max_mismatch,
                  "_",
                  aligner,
                  "_",
                  reference_genome)

# perform mapping
bam_files <- lapply(
  fastq_files,
  map_fastq,
  fasta_name = reference_fasta,
  index_name = reference_index,
  max_map_mismatch = max_mismatch,
  aligner = aligner,
  force = FALSE,
  bam_dir = bam_dir,
  threads = 4
)
#> bam_file '../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_ACAC_valid.fastq.gz.bam' already exists, set force=TRUE to recompute
#> bam_file '../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_ATTG_valid.fastq.gz.bam' already exists, set force=TRUE to recompute
#> bam_file '../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_GACG_valid.fastq.gz.bam' already exists, set force=TRUE to recompute
#> bam_file '../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_GGTA_valid.fastq.gz.bam' already exists, set force=TRUE to recompute

Convert bam files to count table with quantify_3prime

The quantify_3prime is called on each BAM file. The features table is passed to take into account the UMIs that were present in the reads to remove duplicates.

count_table <- quantify_from_bam_files(bam_files,
                                       conditions,
                                       quantify_3prime,
                                       features = read_features,
                                       min_mapped_length = 20,
                                       rev_comp_reads = TRUE,
                                       keep_UMI = FALSE)
#> Quantification 3' output file exists, set force=T to overwrite: ../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_ACAC_valid.fastq.gz.bam.csv
#> Quantification 3' output file exists, set force=T to overwrite: ../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_ATTG_valid.fastq.gz.bam.csv
#> Quantification 3' output file exists, set force=T to overwrite: ../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_GACG_valid.fastq.gz.bam.csv
#> Quantification 3' output file exists, set force=T to overwrite: ../extdata/MenT1/Msm.total.RNA.MenT1.Library_demux/mm0_subread_Msm.reference.tRNAs.with.CCA.fasta/Msm.total.RNA.MenT1.Library_GGTA_valid.fastq.gz.bam.csv
count_table %>%
  head(15) %>%
  knitr::kable()

sample_id

rname

position

strand

downstream_seq

reads

count

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACAG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACAGGCAGTGTGGGGGCCAGG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACAGGCAGTGTGGGGGT

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACGGGCAGT

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACGGGCAGTGTGGGGGCCAGGGGTTA

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACGGGCAGTGTGGGGGTCAGG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACGGGCAGTGTGGGGGTTAGG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACTGGGAGT

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACTGGTAG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACTGGTAGT

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACTGGTAGTGTGGGGGTCAGGG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACTGTCTCCTAGGCTCCACCA

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACACTTGCAGTGTGTGGGTCAGGG

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACAGTGCTGCACGGCGGTCTGATC

1

1

ctrl.r1

Msm.Ala-CGC.trna33

25

+

ACCACAGTGGCAGT

1

1

Downstream analysis: 3’-end post-transciptional modifications exploration

Pivot count_table to have counts for each sample for the same 3’-end. For noise reduction, we will only keep rows for which there are at least 10 reads in one sample.

count_table_w <- count_table %>%
  select(-reads) %>%
  pivot_wider(names_from = "sample_id", values_from = "count", values_fill = 0) %>%
  filter(if_any(all_of(conditions), ~ . >= 10))
count_table_w %>%
  head(15) %>%
  knitr::kable()

rname

position

strand

downstream_seq

ctrl.r1

ctrl.r2

MenT1.r1

MenT1.r2

Msm.Ala-CGC.trna33

26

+

137

2

6

145

Msm.Ala-CGC.trna33

27

+

130

12

15

151

Msm.Ala-CGC.trna33

28

+

79

5

12

86

Msm.Ala-CGC.trna33

29

+

75

12

15

93

Msm.Ala-CGC.trna33

30

+

58

27

25

78

Msm.Ala-CGC.trna33

31

+

62

5

8

78

Msm.Ala-CGC.trna33

32

+

36

5

5

41

Msm.Ala-CGC.trna33

33

+

72

31

29

91

Msm.Ala-CGC.trna33

34

+

72

99

106

99

Msm.Ala-CGC.trna33

35

+

142

341

382

166

Msm.Ala-CGC.trna33

36

+

167

44

55

150

Msm.Ala-CGC.trna33

37

+

158

107

95

115

Msm.Ala-CGC.trna33

38

+

66

13

8

59

Msm.Ala-CGC.trna33

39

+

113

18

20

106

Msm.Ala-CGC.trna33

40

+

146

8

6

132

Next, we are going to focus on full length tRNAs. Thus, we will need their length.

tRNAs with expected length from reference sequences FASTA file:

library(Biostrings)
tRNAs_DNASS <- readDNAStringSet(paste0(data_dir, "/", reference_genome))
tRNAs <- tibble(rlength = width(tRNAs_DNASS),
                seq = as.character(tRNAs_DNASS),
                name = names(tRNAs_DNASS)) %>%
  mutate(id = str_extract(name, "trna\\d+.\\S+"),
         rname = as.factor(str_extract(name, "^\\S+"))) %>%
  select(rname, rlength, seq)
tRNAs %>%
  head(15) %>%
  knitr::kable()

rname

rlength

seq

Msm.Ala-CGC.trna33

76

GGGGCTATGGCGCAGTTGGTAGCGCGACTCGTTCGCATCGAGTAGGTCAGGGGTTCGATTCCCCTTAGCTCCACCA

Msm.Ala-GGC.trna9

76

GGGGCTATGGCGCAGTTGGTAGCGCACCACACTGGCAGTGTGGGGGTCAGGGGTTCGAATCCCCTTAGCTCCACCA

Msm.Ala-TGC.trna2

76

GGGGCCTTAGCTCAGTTGGTAGAGCGCTGCCTTTGCAAGGCAGATGTCAGGAGTTCGAATCTCCTAGGCTCCACCA

Msm.Arg-ACG.trna27

76

GCGCCCGTAGCTCAACGGATAGAGCATCTGACTACGGATCAGAAGGTTAGGGGTTCGAATCCCTTCGGGCGCACCA

Msm.Arg-TCT.trna15

76

GCCTCCGTAGCTCAATGGATAGAGCATCGGCCTTCTAATCCGACGGTTGCAGGTTCGAGTCCTGCCGGGGGCGCCA

Msm.Arg-CCG.trna18

76

GCCCCCGTAGCTCAGGGGATAGAGCGTCTGCCTCCGGAGCAGAAGGCCGCAGGTTCGAATCCTGCCGGGGGCACCA

Msm.Arg-CCT.trna20

76

GCCCTCGTAGCTCAGGGGATAGAGCACGGCTCTCCTAAAGCCGGTGTCGCAGGTTCGAATCCTGCCGGGGGCACCA

Msm.Asn-GTT.trna37

76

TCCCCTGTAGCTCAATTGGCAGAGCATTCGGCTGTTAACCGAAGGGTTGCTGGTTCGAGTCCAGCCGGGGGAGCCA

Msm.Asp-GTC.trna31

77

GGCCCTGTGGCGCAGTTGGTTAGCGCGCCGCCCTGTCACGGCGGAGGTCGCGGGTTCAAGTCCCGTCAGGGTCGCCA

Msm.Cys-GCA.trna40

74

GCGGGTGTGGCTGAGTGGCTAGGCAGCGGCCTGCAAAGCCGTCTACACGGGTTCGAGTCCCGTCACTCGCTCCA

Msm.Cys-GCA.trna44

74

GGTGGAGTGGCCGAGTGGTGAGGCAACGGCCTGCAAAGCCGTGCACACGGGTTCGATTCCCGTCTCCACCTCCA

Msm.Gln-CTG.trna10

75

TGGGGTATGGTGTAATTGGCAACACAGCTGATTCTGGTTCAGCCATTCTAGGTTCGAGTCCTGGTACCCCAGCCA

Msm.Gln-TTG.trna19

75

TCCGTCGTGGTGTAATCGGCAGCACCTCTGATTTTGGTTCAGATAGTTCAGGTTCGAGTCCTGGCGACGGAGCCA

Msm.Glu-CTC.trna11

76

GCCCCCGTCGTCTAGCGGCCTAGGACGCCGCCCTCTCACGGCGGTAGCGTGGGTTCGAATCCCATCGGGGGTACCA

Msm.Glu-TTC.trna30

76

GCCCCCATCGTCTAGTGGCCTAGGACGCCGCCCTTTCACGGCGGTAGCACGGGTTCGAATCCCGTTGGGGGTACCA

Filter only full length tRNAs with or without Cs added post-transcriptionaly

count_table_w <- tRNAs %>%
  select(-seq) %>%
  inner_join(count_table_w, by = join_by(rname)) %>%
  filter(position == rlength, str_detect(downstream_seq, pattern = "^C*$"))
count_table_w %>%
  knitr::kable()

rname

rlength

position

strand

downstream_seq

ctrl.r1

ctrl.r2

MenT1.r1

MenT1.r2

Msm.Ala-CGC.trna33

76

76

+

74878

35268

14950

49270

Msm.Ala-CGC.trna33

76

76

+

C

48

6

5062

23377

Msm.Ala-CGC.trna33

76

76

+

CC

251

50

774

9423

Msm.Ala-CGC.trna33

76

76

+

CCC

132

22

49

655

Msm.Ala-CGC.trna33

76

76

+

CCCC

12

1

6

50

Msm.Ala-GGC.trna9

76

76

+

117766

46916

23538

91997

Msm.Ala-GGC.trna9

76

76

+

C

60

9

8894

38132

Msm.Ala-GGC.trna9

76

76

+

CC

273

49

992

10477

Msm.Ala-GGC.trna9

76

76

+

CCC

135

23

75

722

Msm.Ala-GGC.trna9

76

76

+

CCCC

25

2

4

60

Msm.Ala-TGC.trna2

76

76

+

3696

1322

660

2249

Msm.Ala-TGC.trna2

76

76

+

C

3

0

180

529

Msm.Ala-TGC.trna2

76

76

+

CC

4

0

11

133

Msm.Arg-CCG.trna18

76

76

+

19830

39402

24566

20289

Msm.Arg-CCG.trna18

76

76

+

C

37

3

10869

10466

Msm.Arg-CCG.trna18

76

76

+

CC

27

8

313

185

Msm.Arg-CCG.trna18

76

76

+

CCC

10

2

21

15

Msm.Arg-CCT.trna20

76

76

+

2006

1427

871

1834

Msm.Arg-CCT.trna20

76

76

+

CC

4

0

25

79

Msm.Arg-CCT.trna20

76

76

+

C

0

0

322

593

Msm.Asn-GTT.trna37

76

76

+

5045

1582

1101

3803

Msm.Asn-GTT.trna37

76

76

+

CC

6

1

22

157

Msm.Asn-GTT.trna37

76

76

+

C

0

0

188

925

Msm.Asp-GTC.trna31

77

77

+

1187

430

161

490

Msm.Asp-GTC.trna31

77

77

+

CC

1

0

4

13

Msm.Asp-GTC.trna31

77

77

+

C

0

0

51

143

Msm.Cys-GCA.trna44

74

74

+

15287

2177

1418

12162

Msm.Cys-GCA.trna44

74

74

+

C

14

5

742

7331

Msm.Cys-GCA.trna44

74

74

+

CC

19

5

52

736

Msm.Cys-GCA.trna44

74

74

+

CCC

19

2

0

35

Msm.Gln-CTG.trna10

75

75

+

2025

1696

1279

1536

Msm.Gln-CTG.trna10

75

75

+

C

1

1

208

289

Msm.Gln-CTG.trna10

75

75

+

CC

7

1

17

51

Msm.Glu-CTC.trna11

76

76

+

369

454

154

190

Msm.Glu-CTC.trna11

76

76

+

C

0

0

48

20

Msm.Glu-TTC.trna30

76

76

+

20182

48205

20106

10764

Msm.Glu-TTC.trna30

76

76

+

C

17

3

5826

1919

Msm.Glu-TTC.trna30

76

76

+

CC

19

4

671

334

Msm.Glu-TTC.trna30

76

76

+

CCC

2

0

57

12

Msm.Gly-GCC.trna43

76

76

+

323

68

27

139

Msm.Gly-GCC.trna43

76

76

+

C

0

0

5

37

Msm.Gly-TCC.trna14

74

74

+

14279

7436

3028

13502

Msm.Gly-TCC.trna14

74

74

+

C

25

2

2041

8300

Msm.Gly-TCC.trna14

74

74

+

CC

59

10

165

821

Msm.Gly-TCC.trna14

74

74

+

CCC

50

15

5

76

Msm.Gly-TCC.trna14

74

74

+

CCCC

33

2

5

34

Msm.Gly-TCC.trna14

74

74

+

CCCCC

4

1

1

12

Msm.His-GTG.trna16

76

76

+

227

35

16

84

Msm.His-GTG.trna16

76

76

+

C

0

0

2

32

Msm.Ile-GAT.trna1

77

77

+

22255

11100

7542

20170

Msm.Ile-GAT.trna1

77

77

+

C

11

1

1168

4340

Msm.Ile-GAT.trna1

77

77

+

CC

16

6

118

645

Msm.Ile-GAT.trna1

77

77

+

CCC

7

0

5

34

Msm.Leu-CAA.trna42

77

77

+

13281

3553

1655

7642

Msm.Leu-CAA.trna42

77

77

+

C

7

2

578

2357

Msm.Leu-CAA.trna42

77

77

+

CC

10

0

42

223

Msm.Leu-CAA.trna42

77

77

+

CCC

7

2

5

17

Msm.Leu-CAG.trna3

86

86

+

103054

76258

42932

70265

Msm.Leu-CAG.trna3

86

86

+

C

50

12

16217

20681

Msm.Leu-CAG.trna3

86

86

+

CC

93

22

1178

2112

Msm.Leu-CAG.trna3

86

86

+

CCC

55

7

59

209

Msm.Leu-CAG.trna3

86

86

+

CCCC

6

8

3

20

Msm.Leu-GAG.trna13

89

89

+

12366

17017

11314

12197

Msm.Leu-GAG.trna13

89

89

+

C

16

2

3619

2905

Msm.Leu-GAG.trna13

89

89

+

CC

15

0

476

559

Msm.Leu-GAG.trna13

89

89

+

CCC

29

5

32

67

Msm.Leu-GAG.trna13

89

89

+

CCCC

5

2

5

12

Msm.Lys-CTT.trna17

76

76

+

16554

8047

5428

10114

Msm.Lys-CTT.trna17

76

76

+

C

11

0

1049

1716

Msm.Lys-CTT.trna17

76

76

+

CC

21

2

131

517

Msm.Lys-CTT.trna17

76

76

+

CCC

9

1

11

48

Msm.Lys-TTT.trna21

76

76

+

1532

668

382

1137

Msm.Lys-TTT.trna21

76

76

+

CC

3

0

6

37

Msm.Lys-TTT.trna21

76

76

+

C

0

0

81

234

Msm.Met-CAT.trna8

77

77

+

38

8

4

21

Msm.Met-CAT.trna6

77

77

+

20

4

2

4

Msm.Pro-CGG.trna23

77

77

+

3601

2261

1796

3718

Msm.Pro-CGG.trna23

77

77

+

C

2

1

337

758

Msm.Pro-CGG.trna23

77

77

+

CC

2

0

13

57

Msm.Pro-GGG.trna41

77

77

+

73943

65589

36733

57991

Msm.Pro-GGG.trna41

77

77

+

C

50

6

8934

21092

Msm.Pro-GGG.trna41

77

77

+

CC

141

26

687

3624

Msm.Pro-GGG.trna41

77

77

+

CCC

103

16

44

220

Msm.Pro-GGG.trna41

77

77

+

CCCC

9

5

5

25

Msm.Pro-TGG.trna36

77

77

+

37

37

7

18

Msm.SeC-TCA.trna46

95

95

+

199

322

166

187

Msm.SeC-TCA.trna46

95

95

+

C

1

0

46

46

Msm.SeC-TCA.trna46

95

95

+

CC

0

0

4

10

Msm.Ser-CGA.trna28

91

91

+

190

36

18

147

Msm.Ser-CGA.trna28

91

91

+

C

0

0

3

34

Msm.Ser-GCT.trna26

92

92

+

523

300

104

523

Msm.Ser-GCT.trna26

92

92

+

C

0

0

10

93

Msm.Ser-GGA.trna24

89

89

+

15842

6155

2893

11747

Msm.Ser-GGA.trna24

89

89

+

C

11

2

709

4548

Msm.Ser-GGA.trna24

89

89

+

CC

4

1

49

504

Msm.Ser-GGA.trna24

89

89

+

CCC

0

3

3

22

Msm.Ser-TGA.trna25

90

90

+

26

13

3

21

Msm.Thr-CGT.trna29

76

76

+

319

103

34

157

Msm.Thr-CGT.trna29

76

76

+

C

1

0

7

41

Msm.Thr-GGT.trna5

76

76

+

11120

10371

5789

5106

Msm.Thr-GGT.trna5

76

76

+

C

11

0

1327

1660

Msm.Thr-GGT.trna5

76

76

+

CC

8

1

75

170

Msm.Thr-TGT

75

75

+

76

16

2

47

Msm.Thr-TGT

75

75

+

C

0

0

0

37

Msm.Trp-CCA.trna7

76

76

+

104

13

8

32

Msm.Tyr-GTA.trna4

86

86

+

411

127

83

273

Msm.Tyr-GTA.trna4

86

86

+

C

0

0

28

97

Msm.Tyr-GTA.trna4

86

86

+

CC

0

0

3

27

Msm.Val-CAC.trna12

75

75

+

952

540

327

916

Msm.Val-CAC.trna12

75

75

+

C

2

0

45

133

Msm.Val-CAC.trna12

75

75

+

CC

1

0

5

40

Msm.Val-GAC.trna45

75

75

+

54835

25926

17708

51804

Msm.Val-GAC.trna45

75

75

+

C

13

0

2205

7261

Msm.Val-GAC.trna45

75

75

+

CC

61

9

212

1511

Msm.Val-GAC.trna45

75

75

+

CCC

17

5

15

68

Visualize proportions of post-transcriptional modifications with bar plots

Pivot to longer format to compute proportions per tRNA per condition

count_table_l <- count_table_w %>%
  pivot_longer(cols = all_of(conditions),
               names_to = "condition",
               values_to = "count")
count_per_tRNA_per_sample <- count_table_l %>%
  group_by(condition, rname) %>%
  summarise(total=sum(count), .groups = "keep") %>%
  select(condition, rname, total)
# totals and percentages
count_table_l <- count_table_l  %>%
  inner_join(count_per_tRNA_per_sample, by = c("condition", "rname")) %>%
  mutate(pc = round(count / total * 100, 1)) %>%
  relocate(pc, .after = count)
count_table_l %>%
  head(15) %>%
  knitr::kable()

rname

rlength

position

strand

downstream_seq

condition

count

pc

total

Msm.Ala-CGC.trna33

76

76

+

ctrl.r1

74878

99.4

75321

Msm.Ala-CGC.trna33

76

76

+

ctrl.r2

35268

99.8

35347

Msm.Ala-CGC.trna33

76

76

+

MenT1.r1

14950

71.7

20841

Msm.Ala-CGC.trna33

76

76

+

MenT1.r2

49270

59.5

82775

Msm.Ala-CGC.trna33

76

76

+

C

ctrl.r1

48

0.1

75321

Msm.Ala-CGC.trna33

76

76

+

C

ctrl.r2

6

0.0

35347

Msm.Ala-CGC.trna33

76

76

+

C

MenT1.r1

5062

24.3

20841

Msm.Ala-CGC.trna33

76

76

+

C

MenT1.r2

23377

28.2

82775

Msm.Ala-CGC.trna33

76

76

+

CC

ctrl.r1

251

0.3

75321

Msm.Ala-CGC.trna33

76

76

+

CC

ctrl.r2

50

0.1

35347

Msm.Ala-CGC.trna33

76

76

+

CC

MenT1.r1

774

3.7

20841

Msm.Ala-CGC.trna33

76

76

+

CC

MenT1.r2

9423

11.4

82775

Msm.Ala-CGC.trna33

76

76

+

CCC

ctrl.r1

132

0.2

75321

Msm.Ala-CGC.trna33

76

76

+

CCC

ctrl.r2

22

0.1

35347

Msm.Ala-CGC.trna33

76

76

+

CCC

MenT1.r1

49

0.2

20841

palettefig <- c(
  `3'-`     = "#6696D0",
  `3'-C`    = "#E46D32",
  `3'-CC`   = "#B82123",
  `3'-CCC`  = "#F3B81B",
  `3'-CCCC` = "#D6D65F"
)

tRNA_labs = count_table_l$rname
labs = tRNA_labs %>% as.character %>% unique %>% str_sub(5,11)
names(labs) = tRNA_labs %>% as.character %>% unique 

count_table_l %>%
  mutate(rname = fct_relevel(rname,
                             sort(unique(as.character(rname)),
                                  decreasing = FALSE)),
         condition = fct_relevel(condition,
                                 sort(unique(as.character(condition)),
                                      decreasing = TRUE)),
         downstream_seq = paste0("3'-", downstream_seq)) %>%
  ggplot(aes(fill = downstream_seq, y = pc, x = condition)) +
    geom_bar(position = "stack", stat = "identity") +
    coord_flip() +
    scale_fill_manual(values = palettefig) +
    geom_text(aes(condition, 50, label = total, fill = NULL), size = 3) +
    theme_classic() +
    xlab(NULL) +
    ylab(NULL) +
    facet_wrap(facets = . ~ rname, ncol = 6, labeller = labeller(rname = labs)) +
    labs(fill = "3'-end modification")

Differential proportion analysis: which tRNAs are significantly modified by toxin?

Are the observed differences in proportions of 3’-end modified transcripts statistically significant?

For this, we will compare the proportion of modified tRNAs in the conrtol vs. the proportion of modified tRNAs in the samples treated with MenT1.

In the following, we regroup the modified tRNAs as the “fraction” and we sum up unmodified and modified tRNAs in the “total” as expected by EMOTE_differential_proportion function.

proportion_table <- count_table_l %>%
  select(-rlength) %>%
  mutate(count_of = ifelse(downstream_seq == "", "total", "fraction")) %>%
  group_by(rname, strand, position, condition, count_of) %>%
  summarise(count = sum(count), .groups = "keep") %>%
  separate(condition, into = c("group", "replicate")) %>%
  pivot_wider(names_from = "count_of", values_from = "count", values_fill=0) %>%
  mutate(total = total + fraction) %>% # for dss which expects total and fraction
  pivot_longer(cols = c("fraction", "total"),
               names_to = "count_of",
               values_to = "count") %>%
  mutate(sample_id = paste0(group, ".", replicate, ".", count_of)) %>%
  select(-group, -replicate, -count_of)
proportion_table %>%
  head(15) %>%
  knitr::kable()

rname

strand

position

count

sample_id

Msm.Ala-CGC.trna33

+

76

5891

MenT1.r1.fraction

Msm.Ala-CGC.trna33

+

76

20841

MenT1.r1.total

Msm.Ala-CGC.trna33

+

76

33505

MenT1.r2.fraction

Msm.Ala-CGC.trna33

+

76

82775

MenT1.r2.total

Msm.Ala-CGC.trna33

+

76

443

ctrl.r1.fraction

Msm.Ala-CGC.trna33

+

76

75321

ctrl.r1.total

Msm.Ala-CGC.trna33

+

76

79

ctrl.r2.fraction

Msm.Ala-CGC.trna33

+

76

35347

ctrl.r2.total

Msm.Ala-GGC.trna9

+

76

9965

MenT1.r1.fraction

Msm.Ala-GGC.trna9

+

76

33503

MenT1.r1.total

Msm.Ala-GGC.trna9

+

76

49391

MenT1.r2.fraction

Msm.Ala-GGC.trna9

+

76

141388

MenT1.r2.total

Msm.Ala-GGC.trna9

+

76

493

ctrl.r1.fraction

Msm.Ala-GGC.trna9

+

76

118259

ctrl.r1.total

Msm.Ala-GGC.trna9

+

76

83

ctrl.r2.fraction

Then, we need a design_matrix to specify the group and replicate for each sample_id and also if the counts in the count_table corresponds to the fraction or the total.

design_matrix <- tibble(sample_id = unique(sort(proportion_table$sample_id))) %>%
  separate(sample_id,
           sep = "\\.",
           into = c("group", "replicate", "count_of"),
           remove = FALSE)
design_matrix %>%
  knitr::kable()

sample_id

group

replicate

count_of

ctrl.r1.fraction

ctrl

r1

fraction

ctrl.r1.total

ctrl

r1

total

ctrl.r2.fraction

ctrl

r2

fraction

ctrl.r2.total

ctrl

r2

total

MenT1.r1.fraction

MenT1

r1

fraction

MenT1.r1.total

MenT1

r1

total

MenT1.r2.fraction

MenT1

r2

fraction

MenT1.r2.total

MenT1

r2

total

diff_prop <- differential_proportion(proportion_table,
                                     design_matrix,
                                     group1 = "ctrl",
                                     group2 = "MenT1")
#> Estimating dispersion for each CpG site, this will take a while ...
diff_prop %>%
  knitr::kable()

rname

position

strand

ctrl

MenT1

pval

fdr

Msm.Gly-TCC.trna14

74

+

0.0079261

0.4145317

0.0000000

0.0000000

Msm.Cys-GCA.trna44

74

+

0.0044360

0.3793868

0.0000000

0.0000000

Msm.Arg-CCG.trna18

76

+

0.0020238

0.3288844

0.0000000

0.0000000

Msm.Ala-GGC.trna9

76

+

0.0029674

0.3233828

0.0000000

0.0000000

Msm.Arg-CCT.trna20

76

+

0.0009950

0.2765248

0.0000000

0.0000000

Msm.Leu-CAA.trna42

77

+

0.0014642

0.2638804

0.0000000

0.0000001

Msm.Leu-CAG.trna3

86

+

0.0013089

0.2679313

0.0000000

0.0000002

Msm.Tyr-GTA.trna4

86

+

0.0000000

0.2921362

0.0000001

0.0000002

Msm.Asp-GTC.trna31

77

+

0.0004209

0.2480578

0.0000002

0.0000009

Msm.Leu-GAG.trna13

89

+

0.0028787

0.2463040

0.0000002

0.0000009

Msm.Ala-TGC.trna2

76

+

0.0009452

0.2259275

0.0000003

0.0000010

Msm.Ala-CGC.trna33

76

+

0.0040582

0.3437180

0.0000005

0.0000016

Msm.SeC-TCA.trna46

95

+

0.0025000

0.2309671

0.0000018

0.0000052

Msm.Thr-GGT.trna5

76

+

0.0009011

0.2294034

0.0000039

0.0000103

Msm.Lys-CTT.trna17

76

+

0.0014216

0.1819812

0.0000053

0.0000131

Msm.Pro-GGG.trna41

77

+

0.0024442

0.2546503

0.0000060

0.0000138

Msm.Lys-TTT.trna21

76

+

0.0009772

0.1889863

0.0000068

0.0000149

Msm.Ser-GGA.trna24

89

+

0.0009599

0.2549558

0.0000135

0.0000277

Msm.Pro-CGG.trna23

77

+

0.0007758

0.1714434

0.0000150

0.0000293

Msm.Thr-TGT

75

+

0.0000000

0.2202381

0.0000180

0.0000333

Msm.Glu-TTC.trna30

76

+

0.0010123

0.2098397

0.0000256

0.0000450

Msm.Gln-CTG.trna10

75

+

0.0025565

0.1654189

0.0000555

0.0000934

Msm.Asn-GTT.trna37

76

+

0.0009098

0.1908387

0.0000634

0.0001021

Msm.Ile-GAT.trna1

77

+

0.0010778

0.1727051

0.0001006

0.0001551

Msm.Thr-CGT.trna29

76

+

0.0015625

0.1889012

0.0001962

0.0002904

Msm.Val-CAC.trna12

75

+

0.0015707

0.1457437

0.0002308

0.0003284

Msm.Val-GAC.trna45

75

+

0.0010982

0.1332617

0.0002644

0.0003624

Msm.Gly-GCC.trna43

76

+

0.0000000

0.1832386

0.0003323

0.0004391

Msm.His-GTG.trna16

76

+

0.0000000

0.1934866

0.0009273

0.0011831

Msm.Ser-CGA.trna28

91

+

0.0000000

0.1653512

0.0011126

0.0013722

Msm.Glu-CTC.trna11

76

+

0.0000000

0.1664309

0.0042341

0.0050001

Msm.Ser-GCT.trna26

92

+

0.0000000

0.1193467

0.0043244

0.0050001

Msm.Met-CAT.trna6

77

+

0.0000000

0.0000000

0.7403609

0.8218854

Msm.Met-CAT.trna8

77

+

0.0000000

0.0000000

0.8104833

0.8218854

Msm.Pro-TGG.trna36

77

+

0.0000000

0.0000000

0.8057931

0.8218854

Msm.Ser-TGA.trna25

90

+

0.0000000

0.0000000

0.8115274

0.8218854

Msm.Trp-CCA.trna7

76

+

0.0000000

0.0000000

0.8218854

0.8218854