| GRanges-class {GenomicRanges} | R Documentation |
GRanges objects
Description
The GRanges class is a container for the genomic locations and their associated annotations.
Details
GRanges is a vector of genomic locations and associated annotations. Each element in the vector is comprised of a sequence name, an interval, a strand, and optional metadata columns (e.g. score, GC content, etc.). This information is stored in four components:
seqnamesa 'factor' Rle object containing the sequence names.
rangesan IRanges object containing the ranges.
stranda 'factor' Rle object containing the strand information.
mcolsa DataFrame object containing the metadata columns. Columns cannot be named
"seqnames","ranges","strand","seqlevels","seqlengths","isCircular","start","end","width", or"element".seqinfoa Seqinfo object containing information about the set of genomic sequences present in the GRanges object.
Constructor
-
GRanges(seqnames=NULL, ranges=NULL, strand=NULL, ..., seqinfo=NULL, seqlengths=NULL): Creates a GRanges object.seqnames-
NULL, or an Rle object, character vector, or factor containing the sequence names. ranges-
NULL, or an IRanges object containing the ranges. strand-
NULL, or an Rle object, character vector, or factor containing the strand information. ...-
Metadata columns to set on the GRanges object. All the metadata columns must be vector-like objects of the same length as the object to construct. They cannot be named
"start","end","width", or"element". seqinfo-
Either
NULL, or a Seqinfo object, or a character vector of unique sequence names (a.k.a. seqlevels), or a named numeric vector of sequence lengths. When notNULL,seqinfomust be compatible with the sequence names inseqnames, that is, it must have one entry for each unique sequence name inseqnames. Note that it can have additional entries i.e. entries for seqlevels not present inseqnames. seqlengths-
NULL, or an integer vector named withlevels(seqnames)and containing the lengths (or NA) for each level inlevels(seqnames).
If
rangesis not supplied and/or NULL then the constructor proceeds in 2 steps:An initial GRanges object is created with
as(seqnames, "GRanges").Then this GRanges object is updated according to whatever non-NULL remaining arguments were passed to the call to
GRanges().
As a consequence of this behavior,
GRanges(x)is equivalent toas(x, "GRanges").
Accessors
In the following code snippets, x is a GRanges object.
-
length(x): Get the number of elements. -
seqnames(x),seqnames(x) <- value: Get or set the sequence names.valuecan be an Rle object, a character vector, or a factor. -
ranges(x),ranges(x) <- value: Get or set the ranges.valuecan be an IntegerRanges object. -
start(x),start(x) <- value: Get or setstart(ranges(x)). -
end(x),end(x) <- value: Get or setend(ranges(x)). -
width(x),width(x) <- value: Get or setwidth(ranges(x)). -
strand(x),strand(x) <- value: Get or set the strand.valuecan be an Rle object, character vector, or factor. -
names(x),names(x) <- value: Get or set the names of the elements. -
mcols(x, use.names=FALSE),mcols(x) <- value: Get or set the metadata columns. Ifuse.names=TRUEand the metadata columns are notNULL, then the names ofxare propagated as the row names of the returned DataFrame object. When setting the metadata columns, the supplied value must beNULLor a data-frame-like object (i.e. DataFrame or data.frame) holding element-wise metadata. -
elementMetadata(x),elementMetadata(x) <- value,values(x),values(x) <- value: Alternatives tomcolsfunctions. Their use is discouraged. -
seqinfo(x),seqinfo(x) <- value: Get or set the information about the underlying sequences.valuemust be a Seqinfo object. -
seqlevels(x),seqlevels(x, pruning.mode=c("error", "coarse", "fine", "tidy")) <- value: Get or set the sequence levels.seqlevels(x)is equivalent toseqlevels(seqinfo(x))or tolevels(seqnames(x)), those 2 expressions being guaranteed to return identical character vectors on a GRanges object.valuemust be a character vector with no NAs. See?seqlevelsfor more information. -
seqlengths(x),seqlengths(x) <- value: Get or set the sequence lengths.seqlengths(x)is equivalent toseqlengths(seqinfo(x)).valuecan be a named non-negative integer or numeric vector eventually with NAs. -
isCircular(x),isCircular(x) <- value: Get or set the circularity flags.isCircular(x)is equivalent toisCircular(seqinfo(x)).valuemust be a named logical vector eventually with NAs. -
genome(x),genome(x) <- value: Get or set the genome identifier or assembly name for each sequence.genome(x)is equivalent togenome(seqinfo(x)).valuemust be a named character vector eventually with NAs. -
seqlevelsStyle(x),seqlevelsStyle(x) <- value: Get or set the seqname style forx. See the seqlevelsStyle generic getter and setter in the GenomeInfoDb package for more information. -
score(x), score(x) <- value: Get or set the “score” column from the element metadata. -
granges(x, use.names=FALSE, use.mcols=FALSE): Squeeze the genomic ranges out of GenomicRanges objectxand return them in a GRanges object parallel tox(i.e. same length asx). Ifuse.mcolsisTRUE, the metadata columns are propagated. Ifxis a GenomicRanges derivative with extra column slots, these will be propagated as metadata columns on the returned GRanges object.
Coercion
In the code snippets below, x is a GRanges object.
-
as(from, "GRanges"): Creates a GRanges object from a character vector, a factor, or IntegerRangesList object.When
fromis a character vector (or a factor), each element in it must represent a genomic range in formatchr1:2501-2800(unstranded range) orchr1:2501-2800:+(stranded range)...is also supported as a separator between the start and end positions. Strand can be+,-,*, or missing. The names onfromare propagated to the returned GRanges object. Seeas.character()andas.factor()below for the reverse transformations.Coercing a data.frame or DataFrame into a GRanges object is also supported. See
makeGRangesFromDataFramefor the details. -
as(from, "IntegerRangesList"): Creates a IntegerRangesList object from a GRanges object. Thestrandand metadata columns become inner metadata columns (i.e. metadata columns on the ranges). Theseqlengths(from),isCircular(from), andgenome(from)vectors become the metadata columns. -
as.character(x, ignore.strand=FALSE): Turn GRanges objectxinto a character vector where each range inxis represented by a string in formatchr1:2501-2800:+. Ifignore.strandis TRUE or if all the ranges inxare unstranded (i.e. their strand is set to*), then all the strings in the output are in formatchr1:2501-2800.The names on
xare propagated to the returned character vector. Its metadata (metadata(x)) and metadata columns (mcols(x)) are ignored.See
as(from, "GRanges")above for the reverse transformation. -
as.factor(x): Equivalent tofactor(as.character(x), levels=as.character(sort(unique(x))))
See
as(from, "GRanges")above for the reverse transformation.Note that
table(x)is supported on a GRanges object. It is equivalent to, but much faster than,table(as.factor(x)). -
as.data.frame(x, row.names = NULL, optional = FALSE, ...): Creates a data.frame with columnsseqnames(factor),start(integer),end(integer),width(integer),strand(factor), as well as the additional metadata columns stored inmcols(x). Pass an explicitstringsAsFactors=TRUE/FALSEargument via...to override the default conversions for the metadata columns inmcols(x). -
as(from, "Grouping"): Creates aManyToOneGroupingobject that groupsfromby seqname, strand, start and end (same as the default sort order). This makes it convenient, for example, to aggregate a GenomicRanges object by range.
In the code snippets below, x is a Seqinfo
object.
-
as(x, "GRanges"),as(x, "GenomicRanges"),as(x, "IntegerRangesList"): Turns Seqinfo objectx(with noNAlengths) into a GRanges or IntegerRangesList.
Subsetting
In the code snippets below, x is a GRanges object.
-
x[i]: Return a new GRanges object made of the elements selected byi. -
x[i, j]: Like the above, but allow the user to conveniently subset the metadata columns thruj. -
x[i] <- value: Replacement version ofx[i]. -
x$name,x$name <- value: Shortcuts formcols(x)$nameandmcols(x)$name <- value, respectively. Provided as a convenience, for GRanges objects *only*, and as the result of strong popular demand. Note that those methods are not consistent with the other$and$<-methods in the IRanges/GenomicRanges infrastructure, and might confuse some users by making them believe that a GRanges object can be manipulated as a data.frame-like object. Therefore we recommend using them only interactively, and we discourage their use in scripts or packages. For the latter, usemcols(x)$nameandmcols(x)$name <- value, instead ofx$nameandx$name <- value, respectively.
See ?`[` in the S4Vectors package for more
information about subsetting Vector derivatives and for an important note
about the x[i, j] form.
Note that a GRanges object can be used as a subscript to subset a
list-like object that has names on it. In that case, the names on the
list-like object are interpreted as sequence names.
In the code snippets below, x is a list or List object with
names on it, and the subscript gr is a GRanges object with all its
seqnames being valid x names.
-
x[gr]: Return an object of the same class asxand parallel togr. More precisely, it's conceptually doing:lapply(gr, function(gr1) x[[seqnames(gr1)]][ranges(gr1)])
Concatenation
-
c(x, ..., ignore.mcols=FALSE): Concatenate GRanges objectxand the GRanges objects in...together. See?cin the S4Vectors package for more information about concatenating Vector derivatives.
Splitting
-
split(x, f, drop=FALSE): Splits GRanges objectxaccording tofto create a GRangesList object. Iffis a list-like object thendropis ignored andfis treated as if it wasrep(seq_len(length(f)), sapply(f, length)), so the returned object has the same shape asf(it also receives the names off). Otherwise, iffis not a list-like object, empty list elements are removed from the returned object ifdropisTRUE.
Displaying
In the code snippets below, x is a GRanges object.
-
show(x): By default theshowmethod displays 5 head and 5 tail elements. This can be changed by setting the global optionsshowHeadLinesandshowTailLines. If the object length is less than (or equal to) the sum of these 2 options plus 1, then the full object is displayed. Note that these options also affect the display of GAlignments and GAlignmentPairs objects (defined in the GenomicAlignments package), as well as other objects defined in the IRanges and Biostrings packages (e.g. IRanges and DNAStringSet objects).
Author(s)
P. Aboyoun and H. Pagès
See Also
The IRanges class in the IRanges package for storing a set of integer ranges.
The GPos class for representing a set of genomic positions (i.e. genomic ranges of width 1, a.k.a. genomic loci).
-
makeGRangesFromDataFramefor making a GRanges object from a data.frame or DataFrame object. -
Seqinfo objects and the
seqinfoaccessor and family in the GenomeInfoDb package for accessing/modifying information about the underlying sequences of a GenomicRanges derivative. -
GenomicRanges-comparison for comparing and ordering genomic ranges and/or positions.
-
findOverlaps-methods for finding overlapping genomic ranges and/or positions.
-
intra-range-methods and inter-range-methods for intra range and inter range transformations of GenomicRanges derivatives.
-
coverage-methods for computing the coverage of a set of genomic ranges and/or positions.
-
setops-methods for set operations on GRanges objects.
-
subtract for subtracting a set of genomic ranges from a GRanges object (similar to bedtools subtract).
-
nearest-methods for finding the nearest genomic range/position neighbor.
-
absoluteRangesfor transforming genomic ranges into absolute ranges (i.e. into ranges on the sequence obtained by virtually concatenating all the sequences in a genome). -
tileGenomefor putting tiles on a genome. -
genomicvars for manipulating genomic variables.
-
GRangesList objects.
-
Vector, Rle, and DataFrame objects in the S4Vectors package.
Examples
showClass("GRanges") # shows the known subclasses
## ---------------------------------------------------------------------
## CONSTRUCTION
## ---------------------------------------------------------------------
## Specifying the bare minimum i.e. seqnames and ranges only. The
## GRanges object will have no names, no strand information, and no
## metadata columns:
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1))
gr0
## Specifying names, strand, metadata columns. They can be set on an
## existing object:
names(gr0) <- head(letters, 10)
strand(gr0) <- Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2))
mcols(gr0)$score <- 1:10
mcols(gr0)$GC <- seq(1, 0, length=10)
gr0
## ... or specified at construction time:
gr <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10, GC=seq(1, 0, length=10))
stopifnot(identical(gr0, gr))
## Specifying the seqinfo. It can be set on an existing object:
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
seqinfo(gr0) <- merge(seqinfo(gr0), seqinfo)
seqlevels(gr0) <- seqlevels(seqinfo)
## ... or specified at construction time:
gr <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10, GC=seq(1, 0, length=10),
seqinfo=seqinfo)
stopifnot(identical(gr0, gr))
## ---------------------------------------------------------------------
## COERCION
## ---------------------------------------------------------------------
## From GRanges:
as.character(gr)
as.factor(gr)
as.data.frame(gr)
## From character to GRanges:
x1 <- "chr2:56-125"
as(x1, "GRanges")
as(rep(x1, 4), "GRanges")
x2 <- c(A=x1, B="chr1:25-30:-")
as(x2, "GRanges")
## From data.frame to GRanges:
df <- data.frame(chrom="chr2", start=11:15, end=20:24)
gr3 <- as(df, "GRanges")
## Alternatively, coercion to GRanges can be done by just calling the
## GRanges() constructor on the object to coerce:
gr1 <- GRanges(x1) # same as as(x1, "GRanges")
gr2 <- GRanges(x2) # same as as(x2, "GRanges")
gr3 <- GRanges(df) # same as as(df, "GRanges")
## Sanity checks:
stopifnot(identical(as(x1, "GRanges"), gr1))
stopifnot(identical(as(x2, "GRanges"), gr2))
stopifnot(identical(as(df, "GRanges"), gr3))
## ---------------------------------------------------------------------
## SUMMARIZING ELEMENTS
## ---------------------------------------------------------------------
table(seqnames(gr))
table(strand(gr))
sum(width(gr))
table(gr)
summary(mcols(gr)[,"score"])
## The number of lines displayed in the 'show' method are controlled
## with two global options:
longGR <- sample(gr, 25, replace=TRUE)
longGR
options(showHeadLines=7)
options(showTailLines=2)
longGR
## Revert to default values
options(showHeadLines=NULL)
options(showTailLines=NULL)
## ---------------------------------------------------------------------
## INVERTING THE STRAND
## ---------------------------------------------------------------------
invertStrand(gr)
## ---------------------------------------------------------------------
## RENAMING THE UNDERLYING SEQUENCES
## ---------------------------------------------------------------------
seqlevels(gr)
seqlevels(gr) <- sub("chr", "Chrom", seqlevels(gr))
gr
seqlevels(gr) <- sub("Chrom", "chr", seqlevels(gr)) # revert
## ---------------------------------------------------------------------
## COMBINING OBJECTS
## ---------------------------------------------------------------------
gr2 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 3, 4)),
IRanges(1:10, width=5),
strand='-',
score=101:110, GC=runif(10),
seqinfo=seqinfo)
gr3 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 4, 3)),
IRanges(101:110, width=10),
strand='-',
score=21:30,
seqinfo=seqinfo)
some.gr <- c(gr, gr2)
c(gr, gr2, gr3)
c(gr, gr2, gr3, ignore.mcols=TRUE)
## ---------------------------------------------------------------------
## USING A GRANGES OBJECT AS A SUBSCRIPT TO SUBSET ANOTHER OBJECT
## ---------------------------------------------------------------------
## Subsetting *by* a GRanges subscript is supported only if the object
## to subset is a named list-like object:
x <- RleList(chr1=101:120, chr2=2:-8, chr3=31:40)
x[gr]