# DESCRIPTION

```
Package: rnaends
Title: RNA-ends sequencing specialized package featuring pre-processing, mapping, quantification and analysis of 5'-end or 3'-end RNAs sequencing data
Version: 1.0.0
Authors@R: c(
    person("Roland", "Barriot", , "roland.barriot@gmail.com", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0003-0298-2696")),
    person("Tomas", "Caetano", role = "aut",
           comment = c(ORCID = "0009-0008-4678-2796")),
    person("Gwennaele", "Fichant", role = "ctb",
           comment = c(ORCID = "0000-0001-5480-0239")),
    person("Peter", "Redder", role = "ctb",
           comment = c(ORCID = "0000-0002-3619-632X"))
  )
Description: The rnaends R package focuses on exact 5'-end and/or 3'-end quantification of RNAs via targeted RNA-Seq.
License: GPL (>= 3)
Imports: 
    tidyverse (>= 2.0.0),
    ggupset (>= 0.4),
    DSS (>= 2.48.0),
    Rbowtie (>= 1.42.0),
    Rsamtools (>= 2.16.0),
    Rsubread (>= 2.16.0),
    ShortRead (>= 1.58.0),
    survcomp (>= 1.52.0),
    VGAM (>= 1.1-9)
Remotes:
    bioc/Biostrings,
    bioc/DSS,
    bioc/Rbowtie,
    bioc/Rsamtools,
    bioc/Rsubread,
    bioc/ShortRead,
    bioc/survcomp
Suggests: 
    knitr (>= 1.49),
    rmarkdown (>= 2.29),
    testthat (>= 3.0.0)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
LazyData: true
VignetteBuilder: knitr
Depends: 
    R (>= 4.4.2)
```

# `add_read_feature`

 Add a feature to the `read_features` object.

## Description

This function adds a feature to the `read_features` tibble returned by `init_read_features` or `add_read_feature`.

## Usage

```r
add_read_feature(
  read_features,
  name,
  start,
  width,
  pattern,
  pattern_type = "constant",
  max_mismatch = 0,
  readid_prepend = FALSE
)
```

## Arguments

* `read_features`: `read_features` tibble
* `name`: Name of the feature.
* `start`: Relative start of the feature.
* `width`: Width of the  feature.
* `pattern`: Characters allowed in the feature sequence.
* `pattern_type`: Type of the pattern given: "nucleotides", "constant" or "regexp".
* `max_mismatch`: Maximum number of mismatches allowed in the feature sequence.
* `readid_prepend`: If TRUE the sequence of this feature will be prepended to the read id (in front of) in the FASTQ file.

## Seealso

[read_feature_pattern_types](read-feature-pattern-types) for pattern types, [parse_reads](parse-reads), [demultiplex](demultiplex)

## Value

The given read_features object with the added feature

# `demultiplex`

 Demultiplex a FASTQ file according to barcode sequence reads provided through a `read_features` object

## Description

This function parses each read to retrieve features in the sequence and test their validity.
Valid reads are written in separate files (fastq_file.barcode.fastq.gz) and if keep_invalid_reads = TRUE, invalid reads are write in fastq_file.invalid.fastq.gz
It also writes a feature report which contains features given in the features parameter and a demultiplex report which contains number of valid features for each barcode.

## Usage

```r
demultiplex(
  fastq_file,
  features,
  out_dir = NULL,
  keep_invalid_reads = FALSE,
  yieldsize = 1e+06,
  min_read_length = 12,
  write_read_parsing_tb = FALSE,
  force = FALSE
)
```

## Arguments

* `fastq_file`: Path of the FASTQ file (eventually gzipped) containing single-end reads
* `features`: read_features object
* `out_dir`: Path of the output directory where the demultiplexed FASTQ files will be stored
* `keep_invalid_reads`: If TRUE invalid reads will be kept in a dedicated file
* `yieldsize`: Number of read to process per iteration
* `force`: If TRUE recursively remove the content of out_dir before running instead of stopping the function with an error message

## Value

A tibble with demultiplexing statistics

# `differential_proportion`

 Wrapper of differential_proportion_one_stranded function allow to compare proportion of both strand

## Description

Wrapper of differential_proportion_one_stranded function allow to compare proportion of both strand

## Usage

```r
differential_proportion(
  count_table,
  design_matrix,
  group1,
  group2,
  cores = 1,
  ...
)
```

## Arguments

* `count_table`: Count table
* `design_matrix`: Description of the sample_ids (it must contains a sample_id,group,replicate,count_of(total/fraction) columns)
* `group1`: Name of strain 1
* `group2`: Name of strain 2
* `cores`: number of cores used in the mclappy function
* `...`: arguments that can be pass to the DMLtest function

## Value

a tibble with for each position the pvalue

# `differential_proportion2`

 Statistical comparison of 2 sets of proportion values taking into account replicates

## Description

Statistical comparison of 2 sets of proportion values taking into account replicates

## Usage

```r
differential_proportion2(count_table, design_matrix, group1, group2, ...)
```

## Arguments

* `count_table`: Count table
* `design_matrix`: Description of the samples (it must contains a sample_id, group, replicate, count_of(fraction/total) columns)
* `group1`: Name of strain 1
* `group2`: Name of strain 2
* `...`: arguments that can be pass to the DMLtest function

## Value

a tibble with for each position the pvalue

# `.parse_yield`

 Parse features in a yield from a FASTQ stream reader

## Description

Parse features in a yield from a FASTQ stream reader

## Usage

```r
.parse_yield(yield, features, min_read_length)
```

## Arguments

* `yield`: A yield of ShortReads from a FastQStreamer.
* `features`: `read_features` to be parsed.
* `min_read_length`: minimum length of readseq after parsing.

## Value

a tibble with statistics for each read feature.

# `.trim_yield`

 Trim ShortReads in a yield of a FastQStreamer to keep only a `readseq` feature of a `read_features` object.

## Description

Trim ShortReads in a yield of a FastQStreamer to keep only a `readseq` feature of a `read_features` object.

## Usage

```r
.trim_yield(yield, features, read_parsed)
```

## Arguments

* `yield`: A yield of ShortReads from a FastQStreamer.
* `features`: `read_features` to be parsed.
* `read_parsed`: minimum length of readseq after parsing.

## Value

FastQStream yield with read sequences trimmed to the region corresponding to the `readseq` feature of an `read_features` object.

# `FFT_profile`

 FFT analysis applied to counts or metacounts to detect periodicity

## Description

This function is a wrapper for the `stats::fft` to detect
periodicity in the signal composed of counts (of RNA ends) on a given
region (or metacounts profile)

## Usage

```r
FFT_profile(counts, max_period = 25)
```

## Arguments

* `counts`: table with positions (x) and values (counts)
* `max_period`: results haivng a period over this threshold are discarded

## Value

A tibble with `signal`, `freq` and `period` columns.

# `FFT`

 FFT analysis over one or more regions

## Description

This function computes the periodicity of the (meta)gene count
profiles

## Usage

```r
FFT(
  count_table = NULL,
  region_tb,
  by_sample = FALSE,
  by_region = FALSE,
  min_quantile = 0.25,
  max_period = 25,
  ...
)
```

## Arguments

* `count_table`: Count table (if not empty bam_files will be ignored)
* `region_tb`: a tibble with `rname`, `strand` and `position` for all
regions to extract
* `by_sample`: if FALSE, the counts are aggregated (sum) over the different
samples
* `by_region`: if FALSE, the counts are aggregated (median) over all
regions aligned on the starting location of each region.
* `min_quantile`: if by_region is FALSE and the regions are aggregated, all
regions must be the same length. In this case, min_quantile is used to
compute this lenght wich is given by the quantile of the distribution
of all the region lentghts. By default, it is 0.25 which means that 75% of
the regions are at least this length.
* `max_period`: results haivng a period over this threshold are discarded
* `...`: parameters of the quantify_from_bam_files function if no count
table is provided

## Value

A tibble with at least `signal`, `freq` and `period` columns.
If `by_sample` and/or `by_region` is TRUE, the corresponding columns are also
present in the results, *i.e.* respectively `sample_id` and/or
`(rname, strand, location)`.

# `frame_preference`

 Global frame preference

## Description

This function computes the global frame preference of the reads
distribution by computing the number of counts falling into each translation
frame.

## Usage

```r
frame_preference(
  count_table = NULL,
  region_tb,
  by_sample = FALSE,
  by_region = FALSE,
  ...
)
```

## Arguments

* `count_table`: (tibble) Count table (if not empty `bam_files` will be
ignored)
* `region_tb`: (character) Path of a gff file
* `by_sample`: if FALSE, the counts are aggregated (sum) over the different
samples
* `by_region`: if FALSE, the counts are aggregated (median) over all
regions aligned on the starting location of each region.

## Value

A tibble with `frame` and `pc` columns.

# `init_read_features`

 Initialize a `read_features` object.

## Description

Initializes the `read_features` object with the minimal feature *readseq* corresponding to the region in the read sequence that is expected to align to a reference sequence.

## Usage

```r
init_read_features(
  start,
  width,
  pattern = "ACTG",
  max_mismatch = 0,
  read_features_file = NA
)
```

## Arguments

* `start`: Relative start of the *readseq* feature.
* `width`: Width of the *readseq* feature.
* `pattern`: Characters allowed in the *readseq* sequence.
* `max_mismatch`: Maximum number of mismatches allowed in the *readseq* sequence.
* `read_features_file`: Path of the *read_feature* CSV file if already built and saved.

## Seealso

[add_read_feature](add-read-feature) for adding other features, [parse_reads](parse-reads), [demultiplex](demultiplex)

## Value

A *read_features* tibble with at least 1 row corresponding to the region of the
read sequence that is to be aligned to reference sequences.

## Examples

```r
# only a small region is to be aligned
rf = init_read_features(start = 14, width = 19)
# the whole 100nt long read should be aligned to the reference
rf = init_read_features(start = 1, width = 100)
rf
```

# `load_bam_as_tibble`

 Bam file parser

## Description

This function parse a given bam file and return different mapping informations in a tibble

## Usage

```r
load_bam_as_tibble(bam_file, features = NULL)
```

## Arguments

* `bam_file`: (character) Path of the bam file
* `UMI_length`: (integer) Width of the UMI (if UMIs have not been put in the qname let this value default)

## Value

(tibble) with mapping informations

# `map_fastq_files`

 Wrapper of the map_fastq_file function which allows the mapping of multiples FASTQ files.

## Description

The function takes a vector of FASTQ file paths and others map_fastq parameters.
All parameters can be replaced by a "mapping table" which is a tibble where each column correspond to a parameters of the map_fastq function.

## Usage

```r
map_fastq_files(
  list_fastq_files,
  fasta_name,
  index_name,
  max_map_mismatch = 0,
  bam_dir = "",
  aligner = "bowtie",
  threads = 3,
  map_unique = TRUE,
  force = FALSE,
  mapping_tb = NULL
)
```

## Arguments

* `list_fastq_files`: (character) or (vector of character) Path of the FASTQ file (eventually gzipped) containing single-end EMOTE reads
* `fasta_name`: (character) Path of the FASTA genome reference file
* `index_name`: (character) Name of the index (index will be created if not exist)
* `max_map_mismatch`: a int corresponding to the number of allowed mismatch
* `bam_dir`: (character) Path of the output directory
* `aligner`: (character) Mapping aligner (either "bowtie" or "subread")
* `threads`: (integer) Number of threads used
* `force`: (logical) If TRUE overwrite already existing bam file
* `mapping_tb`: (tibble) where colnames = parameters of the map_fastq_file function

## Value

(character) bam file created

# `map_fastq`

 Perform the mapping of a FASTQ file using either bowtie or subread method.

## Description

The function takes a FASTQ file and a genome FASTA file. An index is created if not existing, reads are mapped on the given reference genome.

## Usage

```r
map_fastq(
  fastq_file,
  fasta_name,
  index_name,
  max_map_mismatch = 0,
  bam_dir = "",
  aligner = "bowtie",
  threads = 3,
  map_unique = TRUE,
  force = FALSE
)
```

## Arguments

* `fastq_file`: Path of the FASTQ file (eventually gzipped) containing single-end reads
* `fasta_name`: Path of the FASTA genome reference file
* `index_name`: Name of the index (index will be created if not exist)
* `max_map_mismatch`: a int corresponding to the number of allowed mismatch
* `bam_dir`: Path of the output directory
* `aligner`: Mapping aligner (either "bowtie" or "subread")
* `threads`: Number of threads used
* `force`: If TRUE overwrite already existing bam file

## Value

Path of BAM file created

# `map_stats_bam_files`

 Mapping stat

## Description

This function parses the specified bam_files and returns simple stats such
as the percentage of reads mapped.
The result is saved in bam_file.map_stats.csv so that it is not done again
on the same BAM file if it is older that the stats or force is set to FALSE.

## Usage

```r
map_stats_bam_files(bam_files, force = FALSE)
```

## Arguments

* `bam_files`: vector of BAM filenames
* `force`: (default FALSE) Force the computation of stats.

## Value

tibble with mapping statistics

# `map_stats`

 Mapping stat

## Description

This function parses the specified BAM FILE in bam_file and fastq_files and
returns mapping stats such reads mapped.
The result is saved in bam_file.map_stats.csv so that it is not done again
on the same BAM file if it is older that the stats or force is set to FALSE.

## Usage

```r
map_stats(bam_file, force = FALSE)
```

## Arguments

* `bam_file`: Path of the BAM filename.
* `force`: (default FALSE) Force the computation of stats.

## Value

mapping statistics.

# `metacounts_feature`

 retrieve normalized (meta)counts around given feature(s) specified by a
tibble/data.frame with their `rname`, `strand` and `position.`

## Description

This function takes as input the features to extract and the
`width` to consider around each feature specified by its `rname`, `strand`and `position`.

## Usage

```r
metacounts_feature(
  count_table = NULL,
  feature_tb,
  width = 100,
  by_sample = FALSE,
  by_feature = FALSE,
  ...
)
```

## Arguments

* `count_table`: count table (if not empty `bam_files` will be ignored)
* `feature_tb`: a tibble with `rname`, `strand` and `position` for all
regions to extract
* `width`: positions to extract upstream and downstream of the features
* `by_sample`: if FALSE, the counts are aggregated over the different
samples
* `by_feature`: if FALSE, the counts are aggregated over all
regions aligned on the starting location of the feature.

## Value

A tibble with at least `x` and `count` columns. `x` is the relative
distance to the location on the reference sequence `rname` and `strand`, and
`count` is the normalized count on the region (counts sum is 1). If
`by_sample` and/or `by_feature` is TRUE, the corresponding columns are also
present in the results, i.e. respectively `sample_id` and/or
`(rname, strand, location)`.

# `metacounts_normalized`

 normalize (meta)counts on region(s) specified by strand, location and width

## Description

This function takes a input one or more regions and normalizes
the counts by region and sample_id so that they sum up to 1. It is used to
balance the effect of highly transcribed over less transcribed regions, and
also varying sequencing depths.

## Usage

```r
metacounts_normalized(
  count_table,
  rname,
  strand,
  location,
  width,
  around = FALSE,
  by_sample = FALSE,
  by_region = FALSE,
  min_quantile = 0.25
)
```

## Arguments

* `count_table`: count table as returned from quantification, i.e.
a tibble with the following columns: sample_id, rname, position, strand,
count.
* `rname`: rname(s) one reference/chromosome name or a vector of names.
* `strand`: strand(s) a vector of the same length as rname to specify
each strand.
* `location`: location(s) a vector of the same length as rname to specify
each position on the reference.
* `width`: width(s) a vector of the same length as rname to specify
the width upstream and downstream the location (if param around is TRUE) or
the length of the region (if param around is FALSE)0
* `around`: specify if the location is the begninning of the region or its
the center.
* `by_sample`: if FALSE, the counts are aggregated (sum) over the different
samples
* `by_region`: if FALSE, the counts are aggregated (median) over all
regions aligned on the starting location of each region.
* `min_quantile`: if by_region is FALSE and the regions are aggregated, all
regions must be the same length. In this case, min_quantile is used to
compute this lenght wich is given by the quantile of the distribution
of all the region lentghts. By default, it is 0.25 which means that 75% of
the regions are at least this length.

## Value

A tibble with at least `x` and `count` columns. `x` is the relative
distance to the location on the reference sequence `rname` and `strand`, and
`count` is the normalized count on the region (counts sum is 1). If
`by_sample` and/or `by_region` is TRUE, the corresponding columns are also
present in the results, i.e. respectively `sample_id` and/or
`(rname, strand, location)`.

# `metacounts_region`

 extract normalized counts for a given feature and optionnaly aggregates them
into metacounts

## Description

This function takes a input one or more regions and normalizes
the counts by region and sample_id so that they sum up to 1. It is used to
balance the effect of highly transcribed over less transcribed regions, and
also varying sequencing depths.

## Usage

```r
metacounts_region(
  count_table = NULL,
  region_tb,
  by_sample = FALSE,
  by_region = FALSE,
  min_quantile = 0.25,
  ...
)
```

## Arguments

* `count_table`: count table (if not empty bam_files will be ignored)
* `region_tb`: a tibble with `rname`, `strand` and `position` for all
regions to extract
* `by_sample`: if FALSE, the counts are aggregated (sum) over the different
samples
* `by_region`: if FALSE, the counts are aggregated (median) over all
regions aligned on the starting location of each region.
* `width`: positions to extract downstream of the specified start position

## Value

A tibble with at least x and count columns. x is the relative
distance to the feature and count is normalized count on the region. If
`by_sample` and/or `by_region` is TRUE, the corresponding columns are also
present in the results, i.e. respectively `sample_id` and/or
`(rname, strand, location)`.

# `parse_reads`

 Parse each read from a FASTQ file to retrieve features in the read sequence and test their validity.

## Description

Valid reads are written to a dedicated file and if `keep_invalid_reads` is TRUE, invalid reads are written to `fastq_file.invalid.fastq.gz`.
It also writes a feature report which contains features given in the features parameter and a parsing report which contains number of read with valid features.

## Usage

```r
parse_reads(
  fastq_file,
  features,
  out_dir = NULL,
  keep_invalid_reads = FALSE,
  min_read_length = 18,
  yieldsize = 1e+06,
  write_read_parsing_tb = FALSE,
  force = FALSE
)
```

## Arguments

* `fastq_file`: Path of the FASTQ file (eventually gzipped) containing single-end reads
* `features`: read_features tibble
* `out_dir`: Path of the output directory where the demultiplexed FASTQ files will be stored
* `keep_invalid_reads`: If TRUE invalid reads will be kept in a dedicated file
* `min_read_length`: Minimum width of reads after parsing
* `yieldsize`: Number of read to process per iteration
* `write_read_parsing_tb`: writes statistics table to file
* `force`: If TRUE recursively remove the content of out_dir before running instead of stopping the function with an error message

## Value

A tibble with parsing statistics.

# `plot.feature_metacounts`

 plot function for metacount profile around a given feature

## Description

This function takes as input the result from `metacounts_feature`and returns the plot from ggplot2. Further plot customization can be applied.

## Usage

```r
# S3 method for feature_metacounts
plot(
  metacounts,
  from = min(metacounts$x),
  to = max(metacounts$x),
  translation_frame = FALSE,
  frame_colors = c("#ff5050", "#3366ff", "#ffcc00"),
  geom = "line",
  x_lab = "Relative position to feature",
  y_lab = "Metacounts"
)
```

## Arguments

* `metacounts`: metacount profile from `metacounts_feature`
* `from`: specify the start position in the profile
* `to`: specify the end position in the profile
* `translation_frame`: if TRUE, profiles are grouped and plotted by
translation frame
* `frame_colors`: vector of colours for each translation frame
* `geom`: specify plot geometry, i.e. "bar" or "line"
* `x_lab`: legend for the x axis
* `y_lab`: legend for the y axis

## Value

a gglot2 plot

# `plot.frame_preference`

 plot function for frame preference results

## Description

This function takes as input the result from `frame_preference`and returns the plot from ggplot2.

## Usage

```r
# S3 method for frame_preference
plot(frame_pref_tb, frame_colors = c("#ff5050", "#3366ff", "#ffcc00"))
```

## Arguments

* `frame_pref_tb`: result returned by `frame_preference`
* `frame_colors`: vector of colours for each translation frame

## Value

a gglot2 plot

# `plot.metacounts_periodicity`

 plot function for the signal periodicity analysis

## Description

This function takes as input the result from `FFT` and
returns the plot from ggplot2 so that further customization can be applied.

## Usage

```r
# S3 method for metacounts_periodicity
plot(fft_tb)
```

## Arguments

* `fft_tb`: result returned by `FFT`

## Value

a ggplot2 plot

# `print.read_features`

 Utility function to display read_features.

## Description

Utility function to display read_features.

## Usage

```r
# S3 method for read_features
print(rf)
```

## Arguments

* `rf`: `read_features` tibble

# `quantify_3prime`

 Function used to quantify the exact 3'-ends of mapped reads

## Description

Caller of load_bam_as_tibble and summarise_alignements

## Usage

```r
quantify_3prime(
  bam_file,
  features = NULL,
  keep_UMI = FALSE,
  min_mapped_length = NA,
  rev_comp_reads = FALSE,
  force = FALSE
)
```

## Arguments

* `bam_file`: (character) Path of the bam file containing single-end EMOTE reads mapped
* `keep_UMI`: (logical) If FALSE the count value corresponds to the number fo different UMI at an exact position

## Value

a tibble with quantification of the reads in the given bam files, by exact position (and by UMI if UMI_length>0)

# `quantify_5prime`

 Function used to quantify the exact 5'-ends of mapped reads

## Description

Caller of load_bam_as_tibble and summarise_alignements

## Usage

```r
quantify_5prime(bam_file, features = NULL, keep_UMI = FALSE, force = FALSE)
```

## Arguments

* `keep_UMI`: (logical) If FALSE the count value corresponds to the number fo different UMI at an exact position
* `bam_files`: (character) Path of the bam file containing single-end EMOTE reads mapped
* `UMI_length`: (integer) Width of UMI, default is 0 (if 0 the read id qname will be return)

## Value

a tibble with quantification of the reads in the given bam files, by exact position (and by UMI if UMI_length>0)

# `quantify_downstream`

 Function used to quantify the exact 3'-ends of mapped reads in XU et al. 2023

## Description

Function used to quantify the exact 3'-ends of mapped reads in XU et al. 2023

## Usage

```r
quantify_downstream(bam_file, umi.length, min.map.length, force = FALSE)
```

## Arguments

* `bam_file`: (character) Path of the bam file containing single-end EMOTE reads mapped

## Value

a tibble with quantification of the reads in the given bam files, by exact position (and by UMI if UMI_length>0)

# `quantify_from_bam_files`

 Wrapper of quantify_5prime and quantify_3prime quantification functions.

## Description

This function takes a vector of bam files and call the "fun" quantification function.
It allow the parralelization through the "cores" parameters

## Usage

```r
quantify_from_bam_files(bam_files, sample_ids, fun, ...)
```

## Arguments

* `bam_files`: (vector of characters) Path of the bam files containing single-end EMOTE reads mapped
* `sample_ids`: (vector of characters) corresponding to a string for identifying the source of reads
* `fun`: (character) The quantification method used (quantify_3prime or quantify_5prime)
* `...`: parameters to give to the "fun" function
* `cores`: (integer) Number of cores

## Value

a tibble with quantification of the reads in the given bam files, by exact position (and by UMI if UMI_length>0)

# `read_feature_pattern_types`

 Read feature pattern types

## Description

Enumeration of the allowed pattern types.

## Usage

```r
read_feature_pattern_types
```

## Keyword

datasets

## Format

An object of class `character` of length 3.

## Details

A vector of characters with 3 elements: *nucleotides*, *constant* and *regexp*:

* `nucleotides`: for random sequences composed of the specifies nucleotides (defaults are A, T, C , G) *i.e* the mappable *readseq* or the *UMIs*.
* `constant`: for a constant string or a vector of constant strings, *i.e.* allowed *barcodes* for multiplexed samples or some *control sequence* that should be present in the reads. A maximum number of mismatches allowed can be specified.
* `regepx`: for more complicated patterns that can occur anywhere such as polyA tails.

# `reads_parse_report_ggupset`

 Utility function to plot an upset plot from `parse_reads` and `demultiplex`
reports using ggupset package.

## Description

Utility function to plot an upset plot from `parse_reads` and `demultiplex`reports using ggupset package.

## Usage

```r
reads_parse_report_ggupset(report, angle = 45, margin = 0.1)
```

## Arguments

* `report`: report returned by `parse_reads` or `demultiplex`

# `reads_parse_report_to_upsetr_expression`

 Utility function to convert `parse_reads` and `demultiplex` reports to
expression for UpSetR plot.

## Description

Utility function to convert `parse_reads` and `demultiplex` reports to
expression for UpSetR plot.

## Usage

```r
reads_parse_report_to_upsetr_expression(report)
```

## Arguments

* `report`: report returned by `parse_reads` or `demultiplex`

# `summarise_alignements`

 Parse mapping information retrieve by load_bam_as_tibble

## Description

This function parse the output of load_bam_as_tibble and count the number of reads mapped at each exact genomic position

## Usage

```r
summarise_alignements(alns, UMI_length, keep_UMI)
```

## Arguments

* `alns`: (tibble) Mapping table returned by load_bam_as_tibble
* `UMI_length`: (integer) Width of the UMI (if UMIs have not been put in the qname let this value default)
* `keep_UMI`: (logical) If TRUE the result tibble will display the count value for each UMI at each exact position

## Value

(tibble) count table

# `TSS_betabinomial`

 Computes a p-value for a single position using the betabinomial model
as in TSS-EMOTE paper.

## Description

Computes a p-value for a single position using the betabinomial model
as in TSS-EMOTE paper.

## Usage

```r
TSS_betabinomial(k_total, k_mono, n_total, n_mono)
```

## Arguments

* `k_total`: Counts on the +RppH (total) at a specific position
* `k_mono`: Counts on the -RppH (mono) at a specific position
* `n_total`: Total counts on the +RppH (total) in the RNA-Seq library
* `n_mono`: Total counts on the -RppH (mono) in the RNA-Seq library

## Value

The p-value computed for the given position.

# `TSS`

 Identification of Transcription Start Sites (TSS) as in TSS-EMOTE paper

## Description

This function computes a p-value for each position present in a
count table. P-values of replicate samples are combined and adjusted for
multiple testing.

## Usage

```r
TSS(
  count_table,
  design_table,
  combine_method = "fisher",
  fdr_method = "BH",
  threshold = 5
)
```

## Arguments

* `count_table`: Count table
* `design_table`: Description of the samples. It must contains the
following columns: (`sample_id`, `group`, `replicate`,
`count_of`(total/fraction).
* `combine_method`: P-values combination method to use, default is
"fisher" ("fisher", "z.transform", "logit")
* `fdr_method`: P-values adjustment method, default is "BH"
("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none"))
* `threshold`: Minimum counts in the total sample for the position to be
considered

## Value

### A tibble containing the results for each position, *i.e.* with
columns `position`, `strand`, `pvalue`, `pvalue_adj`.