| nearest-methods {GenomicRanges} | R Documentation |
Finding the nearest genomic range/position neighbor
Description
The nearest, precede, follow, distance,
nearestKNeighbors, and distanceToNearest methods for
GenomicRanges objects and subclasses.
Usage
## S4 method for signature 'GenomicRanges,GenomicRanges'
precede(x, subject,
select=c("first", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
precede(x, subject,
select=c("first", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
follow(x, subject,
select=c("last", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
follow(x, subject,
select=c("last", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
nearest(x, subject,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
nearest(x, subject,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
nearestKNeighbors(x, subject, k=1L,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
nearestKNeighbors(x, subject, k=1L,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
distanceToNearest(x, subject,
ignore.strand=FALSE, ...)
## S4 method for signature 'GenomicRanges,missing'
distanceToNearest(x, subject,
ignore.strand=FALSE, ...)
## S4 method for signature 'GenomicRanges,GenomicRanges'
distance(x, y,
ignore.strand=FALSE, ...)
Arguments
x |
The query GenomicRanges instance. |
subject |
The subject GenomicRanges instance
within which the nearest neighbors are found. Can be missing,
in which case |
y |
For the |
k |
For the |
select |
Logic for handling ties. By default, all methods
select a single interval (arbitrary for When |
ignore.strand |
A |
... |
Additional arguments for methods. |
Details
nearest: Performs conventional nearest neighbor finding. Returns an integer vector containing the index of the nearest neighbor range in
subjectfor each range inx. If there is no nearest neighborNAis returned. For details of the algorithm see the man page in the IRanges package (?nearest).precede: For each range in
x,precedereturns the index of the range insubjectthat is directly preceded by the range inx. Overlapping ranges are excluded.NAis returned when there are no qualifying ranges insubject.follow: The opposite of
precede,followreturns the index of the range insubjectthat is directly followed by the range inx. Overlapping ranges are excluded.NAis returned when there are no qualifying ranges insubject.nearestKNeighbors: Performs conventional k-nearest neighbor finding. Returns an IntegerList containing the index of the k-nearest neighbors in
subjectfor each range inx. If there is no nearest neighborNAis returned. Ifselect="all"is specified, ties will be included in the resulting IntegerList.Orientation and strand for
precedeandfollow: Orientation is 5' to 3', consistent with the direction of translation. Because positional numbering along a chromosome is from left to right and transcription takes place from 5' to 3',precedeandfollowcan appear to have ‘opposite’ behavior on the+and-strand. Using positions 5 and 6 as an example, 5 precedes 6 on the+strand but follows 6 on the-strand.The table below outlines the orientation when ranges on different strands are compared. In general, a feature on
*is considered to belong to both strands. The single exception is when bothxandsubjectare*in which case both are treated as+.x | subject | orientation -----+-----------+---------------- a) + | + | ---> b) + | - | NA c) + | * | ---> d) - | + | NA e) - | - | <--- f) - | * | <--- g) * | + | ---> h) * | - | <--- i) * | * | ---> (the only situation where * arbitrarily means +)distanceToNearest: Returns the distance for each range in
xto its nearest neighbor in thesubject.distance: Returns the distance for each range in
xto the range iny. The behavior ofdistancehas changed in Bioconductor 2.12. See the man page?distancein the IRanges package for details.
Value
For nearest, precede and follow, an integer
vector of indices in subject, or a Hits if
select="all".
For nearestKNeighbors, an IntegerList of vertices in
subject.
For distanceToNearest, a Hits object with a
column for the query index (queryHits), subject index
(subjectHits) and the distance between the pair.
For distance, an integer vector of distances between the ranges
in x and y.
Author(s)
P. Aboyoun and V. Obenchain
See Also
The GenomicRanges and GRanges classes.
The IntegerRanges class in the IRanges package.
The Hits class in the S4Vectors package.
The nearest-methods man page in the IRanges package.
-
findOverlaps-methods for finding just the overlapping ranges.
The nearest-methods man page in the GenomicFeatures package.
Examples
## -----------------------------------------------------------
## precede() and follow()
## -----------------------------------------------------------
query <- GRanges("A", IRanges(c(5, 20), width=1), strand="+")
subject <- GRanges("A", IRanges(rep(c(10, 15), 2), width=1),
strand=c("+", "+", "-", "-"))
precede(query, subject)
follow(query, subject)
strand(query) <- "-"
precede(query, subject)
follow(query, subject)
## ties choose first in order
query <- GRanges("A", IRanges(10, width=1), c("+", "-", "*"))
subject <- GRanges("A", IRanges(c(5, 5, 5, 15, 15, 15), width=1),
rep(c("+", "-", "*"), 2))
precede(query, subject)
precede(query, rev(subject))
## ignore.strand=TRUE treats all ranges as '+'
precede(query[1], subject[4:6], select="all", ignore.strand=FALSE)
precede(query[1], subject[4:6], select="all", ignore.strand=TRUE)
## -----------------------------------------------------------
## nearest()
## -----------------------------------------------------------
## When multiple ranges overlap an "arbitrary" range is chosen
query <- GRanges("A", IRanges(5, 15))
subject <- GRanges("A", IRanges(c(1, 15), c(5, 19)))
nearest(query, subject)
## select="all" returns all hits
nearest(query, subject, select="all")
## Ranges in 'x' will self-select when 'subject' is present
query <- GRanges("A", IRanges(c(1, 10), width=5))
nearest(query, query)
## Ranges in 'x' will not self-select when 'subject' is missing
nearest(query)
## -----------------------------------------------------------
## nearestKNeighbors()
## -----------------------------------------------------------
## Without an argument, k defaults to 1
query <- GRanges("A", IRanges(c(2, 5), c(8, 15)))
subject <- GRanges("A", IRanges(c(1, 4, 10, 15), c(5, 7, 12, 19)))
nearestKNeighbors(query, subject)
## Return multiple neighbors with k > 1
nearestKNeighbors(query, subject, k=3)
## select="all" returns all hits
nearestKNeighbors(query, subject, select="all")
## -----------------------------------------------------------
## distance(), distanceToNearest()
## -----------------------------------------------------------
## Adjacent, overlap, separated by 1
query <- GRanges("A", IRanges(c(1, 2, 10), c(5, 8, 11)))
subject <- GRanges("A", IRanges(c(6, 5, 13), c(10, 10, 15)))
distance(query, subject)
## recycling
distance(query[1], subject)
## zero-width ranges
zw <- GRanges("A", IRanges(4,3))
stopifnot(distance(zw, GRanges("A", IRanges(3,4))) == 0L)
sapply(-3:3, function(i)
distance(shift(zw, i), GRanges("A", IRanges(4,3))))
query <- GRanges(c("A", "B"), IRanges(c(1, 5), width=1))
distanceToNearest(query, subject)
## distance() with GRanges and TxDb see the
## ?'distance,GenomicRanges,TxDb-method' man
## page in the GenomicFeatures package.