Analyze possible specifications from given ORPHAcode and genes.
If such specifications are found, then the ORPHAcode-genes combinations are considered as consistent.
Symbols (see load_genes_synonyms()
to find them all) and HGNC codes are supported,
let the function know which you chose with the mode
argument.
Arguments
- orpha_codes
The ORPHAcodes to update. If length is greater than 1, the full set of genes will be applied to each vector element.
- genes
The mutated genes. If given as a list, it should be length-compatible with
orpha_code
, so that each list element corresponds to oneorpha_codes
entry. Set the.by
argument properly to apply the right set of genes on each ORPHAcode.- mode
Character constant, whether the given genes are
"symbol"
or"HGNC"
codes.- .by
Optionnaly, the set of mutated genes. The default is to consider each row as independent. Set it to NULL or a constant value to apply the full set of
genes
to each element oforpha_codes
. A warning will be raised if any of the considered sets contains more than 10 elements.
Details
Potential specifications will be searched among the descendants of the
given ORPHAcode. They must be associated to one of the mutated genes given in the genes
argument. This kind of association is necessarily "Assessed"
and "Disease-causing"
according to Orphanet.
In specify_code
, specification is applied if and only if a unique potential replacement ORPHAcode was found.
For both functions, each row is considered as independent by default. You have a couple of options to deal cases where an individual genes are spread over multiple rows :
Set the
.by
argument instead ofgroup_by
.Use
group_by
tochop()
mutated genes into a list-column andungroup
to make your data row-wise compatible.
Examples
library(dplyr)
#### Basic usage ####
orpha_code_cmt1 = 65753
orpha_code_cmtX = 64747
## Specification possible
# CMT1B is the only ORPHAcode both associated with CMT1 and MPZ
specify_code(orpha_code_cmt1, 'MPZ', mode='symbol')
#> [1] "101082"
check_orpha_gene_consistency(orpha_code_cmt1, 'MPZ', mode='symbol')
#> [1] TRUE
# CMT1B is the only ORPHAcode both associated with CMT1 and MPZ and/or POMT1
specify_code(orpha_code_cmt1, c('MPZ', 'POMT1'), mode='symbol')
#> [1] "101082"
check_orpha_gene_consistency(orpha_code_cmt1, c('MPZ', 'POMT1'), mode='symbol')
#> [1] TRUE
## Specification impossible
# No ORPHAcode is associated both to CMTX and MPZ
specify_code(orpha_code_cmtX, 'MPZ', mode='symbol')
#> [1] "64747"
check_orpha_gene_consistency(orpha_code_cmtX, 'MPZ', mode='symbol')
#> [1] FALSE
# Several ORPHAcodes are associated both to CMT1 and PMP22 (CMT1A and CMT1E)
specify_code(orpha_code_cmt1, 'PMP22', mode='symbol')
#> [1] "65753"
check_orpha_gene_consistency(orpha_code_cmt1, 'PMP22', mode='symbol') # TRUE
#> [1] TRUE
# Several ORPHAcodes are associated both to CMT1 and PMP22 (CMT1A and CMT1E)
# or MPZ (CMT1B), but none with both PMP22 and MPZ.
specify_code(orpha_code_cmt1, c('MPZ', 'PMP22'), mode='symbol')
#> [1] "65753"
check_orpha_gene_consistency(orpha_code_cmt1, c('MPZ', 'PMP22'), mode='symbol') # Is it consistent ?
#> [1] TRUE
## Alternatively with HGNC codes (the default mode)
# CMT1B is the only ORPHAcode both associated with CMT1 and MPZ
specify_code(orpha_code_cmt1, 7225)
#> [1] "101082"
#### Using dataframes ####
df = tibble(
patient_id=c('A', 'A', 'B', 'C', 'D', 'D'),
symbol = c("MPZ", "LITAF", "VWF", "LITAF", "MPZ", "VWF"),
# CMT1 (ORPHA:65753) and von Willebrand (ORPHA:903)
initial_orpha_code = c("65753", "65753", "903", "65753", "65753", "65753"))
## Basic call : each row is independent
df_spec = df %>% mutate(assigned_orpha_code =
specify_code(initial_orpha_code, genes=symbol, mode='symbol'))
## Grouping may be preferable
df_spec = df %>% mutate(assigned_orpha_code = specify_code(
initial_orpha_code, genes=symbol, mode='symbol', .by=patient_id))
## Equivalent method with genes in a list-column
df = tibble(
patient_id=c('A', 'B', 'C', 'D'),
initial_orpha_code = c("65753", "903", "65753", "65753"),
symbol = list(c("MPZ", "LITAF"), "VWF", "LITAF", c("MPZ", "VWF")))
df_spec = df %>% mutate(assigned_orpha_code =
specify_code(initial_orpha_code, genes=symbol, mode='symbol'))