| makeGRangesFromDataFrame {GenomicRanges} | R Documentation |
Make a GRanges object from a data.frame or DataFrame
Description
makeGRangesFromDataFrame takes a data-frame-like object as
input and tries to automatically find the columns that describe
genomic ranges. It returns them as a GRanges object.
makeGRangesFromDataFrame is also the workhorse behind the
coercion method from data.frame (or DataFrame) to
GRanges.
Usage
makeGRangesFromDataFrame(df,
keep.extra.columns=FALSE,
ignore.strand=FALSE,
seqinfo=NULL,
seqnames.field=c("seqnames", "seqname",
"chromosome", "chrom",
"chr", "chromosome_name",
"seqid"),
start.field="start",
end.field=c("end", "stop"),
strand.field="strand",
starts.in.df.are.0based=FALSE,
na.rm=FALSE)
Arguments
df |
A data.frame or DataFrame object. If not, then
the function first tries to turn |
keep.extra.columns |
|
ignore.strand |
|
seqinfo |
Either |
seqnames.field |
A character vector of recognized names for the column in |
start.field |
A character vector of recognized names for the column in |
end.field |
A character vector of recognized names for the column in |
strand.field |
A character vector of recognized names for the column in |
starts.in.df.are.0based |
|
na.rm |
|
Value
A GRanges object with one element per row in the input.
If the seqinfo argument was supplied, the returned object will
have exactly the seqlevels specified in seqinfo and in the same
order. Otherwise, the seqlevels are ordered according to the output of
the rankSeqlevels function (except if
df contains the seqnames in the form of a factor-Rle, in which
case the levels of the factor-Rle become the seqlevels of the returned
object and with no re-ordering).
If df has non-automatic row names (i.e. rownames(df) is
not NULL and is not seq_len(nrow(df))), then they will be
used to set names on the returned GRanges object.
Note
Coercing data.frame or DataFrame df into
a GRanges object (with as(df, "GRanges")), or
calling GRanges(df), are both equivalent to calling
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE).
Author(s)
H. Pagès, based on a proposal by Kasper Daniel Hansen
See Also
-
GRanges objects.
-
Seqinfo objects and the
rankSeqlevelsfunction in the GenomeInfoDb package. The
makeGRangesListFromFeatureFragmentsfunction for making a GRangesList object from a list of fragmented features.The
getTablefunction in the rtracklayer package for an R interface to the UCSC Table Browser.-
DataFrame objects in the S4Vectors package.
Examples
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
df <- data.frame(chr="chr1", start=11:15, end=12:16,
strand=c("+","-","+","*","."), score=1:5)
df
makeGRangesFromDataFrame(df) # strand value "." is replaced with "*"
## NA in ranges
df$start[5] <- df$end[2] <- NA
df
#makeGRangesFromDataFrame(df) # error!
makeGRangesFromDataFrame(df, na.rm=TRUE) # rows with NAs got dropped
## The strand column is optional:
df <- data.frame(chr="chr1", start=11:15, end=12:16, score=1:5)
makeGRangesFromDataFrame(df)
gr <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
gr2 <- as(df, "GRanges") # equivalent to the above
stopifnot(identical(gr, gr2))
gr2 <- GRanges(df) # equivalent to the above
stopifnot(identical(gr, gr2))
makeGRangesFromDataFrame(df, ignore.strand=TRUE)
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
ignore.strand=TRUE)
makeGRangesFromDataFrame(df, seqinfo=paste0("chr", 4:1))
makeGRangesFromDataFrame(df, seqinfo=c(chrM=NA, chr1=500, chrX=100))
makeGRangesFromDataFrame(df, seqinfo=Seqinfo(paste0("chr", 4:1)))
## ---------------------------------------------------------------------
## ABOUT AUTOMATIC DETECTION OF THE seqnames/start/end/strand COLUMNS
## ---------------------------------------------------------------------
## Automatic detection of the seqnames/start/end/strand columns is
## case insensitive:
df <- data.frame(ChRoM="chr1", StarT=11:15, stoP=12:16,
STRAND=c("+","-","+","*","."), score=1:5)
makeGRangesFromDataFrame(df)
## It also ignores a common prefix between the start and end columns:
df <- data.frame(seqnames="chr1", tx_start=11:15, tx_end=12:16,
strand=c("+","-","+","*","."), score=1:5)
makeGRangesFromDataFrame(df)
## The common prefix between the start and end columns is used to
## disambiguate between more than one seqnames column:
df <- data.frame(chrom="chr1", tx_start=11:15, tx_end=12:16,
tx_chr="chr2", score=1:5)
makeGRangesFromDataFrame(df)
## ---------------------------------------------------------------------
## 0-BASED VS 1-BASED START POSITIONS
## ---------------------------------------------------------------------
if (require(rtracklayer)) {
session <- browserSession()
genome(session) <- "sacCer2"
query <- ucscTableQuery(session, "Assembly")
df <- getTable(query)
head(df)
## A common pitfall is to forget that the UCSC Table Browser uses the
## "0-based start" convention:
gr0 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
start.field="chromStart",
end.field="chromEnd")
head(gr0)
## The start positions need to be converted into 1-based positions,
## to adhere to the convention used in Bioconductor:
gr1 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
start.field="chromStart",
end.field="chromEnd",
starts.in.df.are.0based=TRUE)
head(gr1)
}