| BamFile {Rsamtools} | R Documentation |
Maintain and use BAM files
Description
Use BamFile() to create a reference to a BAM file (and
optionally its index). The reference remains open across calls to
methods, avoiding costly index re-loading.
BamFileList() provides a convenient way of managing a list of
BamFile instances.
Usage
## Constructors
BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE,
asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA)
BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE,
qnamePrefixEnd=NA, qnameSuffixStart=NA)
## Opening / closing
## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)
## accessors; also path(), index(), yieldSize()
## S4 method for signature 'BamFile'
isOpen(con, rw="")
## S4 method for signature 'BamFile'
isIncomplete(con)
## S4 method for signature 'BamFile'
obeyQname(object, ...)
obeyQname(object, ...) <- value
## S4 method for signature 'BamFile'
asMates(object, ...)
asMates(object, ...) <- value
## S4 method for signature 'BamFile'
qnamePrefixEnd(object, ...)
qnamePrefixEnd(object, ...) <- value
## S4 method for signature 'BamFile'
qnameSuffixStart(object, ...)
qnameSuffixStart(object, ...) <- value
## actions
## S4 method for signature 'BamFile'
scanBamHeader(files, ..., what=c("targets", "text"))
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFileList'
seqinfo(x)
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
filter=FilterRules(), indexDestination=TRUE,
param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512, byTag=NULL, nThreads=1L)
## S4 method for signature 'BamFileList'
mergeBam(files, destination, ...)
## reading
## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## counting
## S4 method for signature 'BamFile'
idxstatsBam(file, index=file, ...)
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
quickBamFlagSummary(file, ..., param=ScanBamParam(), main.groups.only=FALSE)
Arguments
... |
Additional arguments. For |
con |
An instance of |
x, object, file, files |
A character vector of BAM file paths
(for |
index |
character(1); the BAM index file path (for
|
yieldSize |
Number of records to yield each time the file
is read from with |
asMates |
Logical indicating if records should be paired as mates. See ‘Fields’ section for details. |
qnamePrefixEnd |
Single character (or NA) marking the
end of the qname prefix. When specified, all characters prior to
and including the |
qnameSuffixStart |
Single character (or NA) marking the
start of the qname suffix. When specified, all characters following
and including the |
obeyQname |
Logical indicating if the BAM file is sorted
by |
value |
Logical value for setting |
what |
For |
filter |
A |
destination |
character(1) file path to write filtered reads to. |
indexDestination |
logical(1) indicating whether the destination file should also be indexed. |
byQname, maxMemory, byTag, nThreads |
See |
param |
An optional |
rw |
Mode of file; ignored. |
main.groups.only |
See |
Objects from the Class
Objects are created by calls of the form BamFile().
Fields
The BamFile class inherits fields from the
RsamtoolsFile class and has fields:
- yieldSize:
Number of records to yield each time the file is read from using
scanBamor, whenlength(bamWhich()) != 0, a threshold which yields records in complete ranges whose sum first exceedsyieldSize. SettingyieldSizeon aBamFileListdoes not alter existing yield sizes set on the individualBamFileinstances.- asMates:
A logical indicating if the records should be returned as mated pairs. When
TRUEscanBamattempts to mate (pair) the records and returns two additional fieldsgroupidandmate_status.groupidis an integer vector of unique group ids;mate_statusis a factor with levelmatedfor records successfully paired by the algorithm,ambiguousfor records that are possibly mates but cannot be assigned unambiguously, orunmatedfor reads that did not have valid mates.Mate criteria:
Bit 0x40 and 0x80: Segments are a pair of first/last OR neither segment is marked first/last
Bit 0x100: Both segments are secondary OR both not secondary
Bit 0x10 and 0x20: Segments are on opposite strands
mpos match: segment1 mpos matches segment2 pos AND segment2 mpos matches segment1 pos
tid match
Flags, tags and ranges may be specified in the
ScanBamParamfor fine tuning of results.- obeyQname:
A logical(0) indicating if the file was sorted by qname. In Bioconductor > 2.12 paired-end files do not need to be sorted by
qname. Instead setasMates=TRUEin theBamFilewhen using thereadGAlignmentsListfunction from the GenomicAlignments package.
Functions and methods
BamFileList inherits additional methods from
RsamtoolsFileList and SimpleList.
Opening / closing:
- open.BamFile
Opens the (local or remote)
pathandindex(ifbamIndexis notcharacter(0)), files. Returns aBamFileinstance.- close.BamFile
Closes the
BamFilecon; returning (invisibly) the updatedBamFile. The instance may be re-opened withopen.BamFile.- isOpen
Tests whether the
BamFileconhas been opened for reading.- isIncomplete
Tests whether the
BamFileconis niether closed nor at the end of the file.
Accessors:
- path
Returns a character(1) vector of BAM path names.
- index
Returns a character(0) or character(1) vector of BAM index path names.
- yieldSize, yieldSize<-
Return or set an integer(1) vector indicating yield size.
- obeyQname, obeyQname<-
Return or set a logical(0) indicating if the file was sorted by qname.
- asMates, asMates<-
Return or set a logical(0) indicating if the records should be returned as mated pairs.
Methods:
- scanBamHeader
Visit the path in
path(file), returning the information contained in the file header; seescanBamHeader.- seqinfo, seqnames, seqlength
Visit the path in
path(file), returning aSeqinfo, character, or named integer vector containing information on the anmes and / or lengths of each sequence. Seqnames are ordered as they appear in the file.- scanBam
Visit the path in
path(file), returning the result ofscanBamapplied to the specified path.- countBam
Visit the path(s) in
path(file), returning the result ofcountBamapplied to the specified path.- idxstatsBam
Visit the index in
index(file), quickly returning adata.framewith columnsseqnames,seqlength,mapped(number of mapped reads onseqnames) andunmapped(number of unmapped reads).- filterBam
Visit the path in
path(file), returning the result offilterBamapplied to the specified path. A single file can be filtered to one or several destinations, as described infilterBam.- indexBam
Visit the path in
path(file), returning the result ofindexBamapplied to the specified path.- sortBam
Visit the path in
path(file), returning the result ofsortBamapplied to the specified path.- mergeBam
Merge several BAM files into a single BAM file. See
mergeBamfor details; additional arguments supported bymergeBam,character-methodare also available forBamFileList.- show
Compactly display the object.
Author(s)
Martin Morgan and Marc Carlson
See Also
The
readGAlignments,readGAlignmentPairs, andreadGAlignmentsListfunctions defined in the GenomicAlignments package.-
summarizeOverlapsand findSpliceOverlaps-methods in the GenomicAlignments package for methods that work on a BamFile and BamFileList objects.
Examples
##
## BamFile options.
##
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bf <- BamFile(fl)
bf
## When 'asMates=TRUE' scanBam() reads the data in as
## pairs. See 'asMates' above for details of the pairing
## algorithm.
asMates(bf) <- TRUE
## When 'yieldSize' is set, scanBam() will iterate
## through the file in chunks.
yieldSize(bf) <- 500
## Some applications append a filename (e.g., NCBI Sequence Read
## Archive (SRA) toolkit) or allele identifier to the sequence qname.
## This may result in a unique qname for each record which presents a
## problem when mating paired-end reads (identical qnames is one
## criteria for paired-end mating). 'qnamePrefixEnd' and
## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix.
qnamePrefixEnd(bf) <- "/"
qnameSuffixStart(bf) <- "."
##
## Reading Bam files.
##
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
mustWork=TRUE)
(bf <- BamFile(fl))
head(seqlengths(bf)) # sequences and lengths in BAM file
if (require(RNAseqData.HNRNPC.bam.chr14)) {
bfl <- BamFileList(RNAseqData.HNRNPC.bam.chr14_BAMFILES)
bfl
bfl[1:2] # subset
bfl[[1]] # select first element -- BamFile
## merged across BAM files
seqinfo(bfl)
head(seqlengths(bfl))
}
length(scanBam(fl)[[1]][[1]]) # all records
bf <- open(BamFile(fl)) # implicit index
bf
identical(scanBam(bf), scanBam(fl))
close(bf)
## Use 'yieldSize' to iterate through a file in chunks.
bf <- open(BamFile(fl, yieldSize=1000))
while (nrec <- length(scanBam(bf)[[1]][[1]]))
cat("records:", nrec, "\n")
close(bf)
## Repeatedly visit multiple ranges in the BamFile.
rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))
bf <- open(BamFile(fl))
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
param <- ScanBamParam(which=rng[i], what="seq")
bam <- scanBam(bamFile, param=param)[[1]]
alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)
close(bf)