Class SeekableXZInputStream
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class SeekableXZInputStream extends SeekableInputStream
Each .xz file consist of one or more Streams. Each Stream consist of zero or more Blocks. Each Stream contains an Index of Streams' Blocks. The Indexes from all Streams are loaded in RAM by a constructor of this class. A typical .xz file has only one Stream, and parsing its Index will need only three or four seeks.
To make random access possible, the data in a .xz file must be splitted into multiple Blocks of reasonable size. Decompression can only start at a Block boundary. When seeking to an uncompressed position that is not at a Block boundary, decompression starts at the beginning of the Block and throws away data until the target position is reached. Thus, smaller Blocks mean faster seeks to arbitrary uncompressed positions. On the other hand, smaller Blocks mean worse compression. So one has to make a compromise between random access speed and compression ratio.
Implementation note: This class uses linear search to locate the correct Stream from the data structures in RAM. It was the simplest to implement and should be fine as long as there aren't too many Streams. The correct Block inside a Stream is located using binary search and thus is fast even with a huge number of Blocks.
Memory usage
The amount of memory needed for the Indexes is taken into account when checking the memory usage limit. Each Stream is calculated to need at least 1 KiB of memory and each Block 16 bytes of memory, rounded up to the next kibibyte. So unless the file has a huge number of Streams or Blocks, these don't take significant amount of memory.
Creating random-accessible .xz files
When using XZOutputStream
, a new Block can be started by calling
its endBlock
method. If you know
that the decompressor will only need to seek to certain uncompressed
positions, it can be a good idea to start a new Block at (some of) these
positions (and only at these positions to get better compression ratio).
liblzma in XZ Utils supports starting a new Block with
LZMA_FULL_FLUSH
. XZ Utils 5.1.1alpha added threaded
compression which creates multi-Block .xz files. XZ Utils 5.1.1alpha
also added the option --block-size=SIZE
to the xz command
line tool. XZ Utils 5.1.2alpha added a partial implementation of
--block-list=SIZES
which allows specifying sizes of
individual Blocks.
Example: getting the uncompressed size of a .xz file
String filename = "foo.xz"; SeekableFileInputStream seekableFile = new SeekableFileInputStream(filename); try { SeekableXZInputStream seekableXZ = new SeekableXZInputStream(seekableFile); System.out.println("Uncompressed size: " + seekableXZ.length()); } finally { seekableFile.close(); }
- See Also:
SeekableFileInputStream
,XZInputStream
,XZOutputStream
-
Constructor Summary
Constructors Constructor Description SeekableXZInputStream(SeekableInputStream in)
Creates a new seekable XZ decompressor without a memory usage limit.SeekableXZInputStream(SeekableInputStream in, int memoryLimit)
Creates a new seekable XZ decomporessor with an optional memory usage limit.SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck)
Creates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck, ArrayCache arrayCache)
Creates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.SeekableXZInputStream(SeekableInputStream in, int memoryLimit, ArrayCache arrayCache)
Creates a new seekable XZ decomporessor with an optional memory usage limit.SeekableXZInputStream(SeekableInputStream in, ArrayCache arrayCache)
Creates a new seekable XZ decompressor without a memory usage limit. -
Method Summary
Modifier and Type Method Description int
available()
Returns the number of uncompressed bytes that can be read without blocking.void
close()
Closes the stream and callsin.close()
.void
close(boolean closeInput)
Closes the stream and optionally callsin.close()
.int
getBlockCheckType(int blockNumber)
Gets integrity check type (Check ID) of the given Block.long
getBlockCompPos(int blockNumber)
Gets the position where the given compressed Block starts in the underlying .xz file.long
getBlockCompSize(int blockNumber)
Gets the compressed size of the given Block.int
getBlockCount()
Gets the number of Blocks in the .xz file.int
getBlockNumber(long pos)
Gets the number of the Block that contains the byte at the given uncompressed position.long
getBlockPos(int blockNumber)
Gets the uncompressed start position of the given Block.long
getBlockSize(int blockNumber)
Gets the uncompressed size of the given Block.int
getCheckTypes()
Gets the types of integrity checks used in the .xz file.int
getIndexMemoryUsage()
Gets the amount of memory in kibibytes (KiB) used by the data structures needed to locate the XZ Blocks.long
getLargestBlockSize()
Gets the uncompressed size of the largest XZ Block in bytes.int
getStreamCount()
Gets the number of Streams in the .xz file.long
length()
Gets the uncompressed size of this input stream.long
position()
Gets the current uncompressed position in this input stream.int
read()
Decompresses the next byte from this input stream.int
read(byte[] buf, int off, int len)
Decompresses into an array of bytes.void
seek(long pos)
Seeks to the specified absolute uncompressed position in the stream.void
seekToBlock(int blockNumber)
Seeks to the beginning of the given XZ Block.Methods inherited from class org.tukaani.xz.SeekableInputStream
skip
Methods inherited from class java.io.InputStream
mark, markSupported, nullInputStream, read, readAllBytes, readNBytes, readNBytes, reset, skipNBytes, transferTo
-
Constructor Details
-
SeekableXZInputStream
Creates a new seekable XZ decompressor without a memory usage limit.- Parameters:
in
- seekable input stream containing one or more XZ Streams; the whole input stream is used- Throws:
XZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify options not supported by this implementationEOFException
- less than 6 bytes of input was available fromin
, or (unlikely) the size of the underlying stream got smaller while this was reading from itIOException
- may be thrown byin
-
SeekableXZInputStream
Creates a new seekable XZ decompressor without a memory usage limit.This is identical to
SeekableXZInputStream(SeekableInputStream)
except that this also takes thearrayCache
argument.- Parameters:
in
- seekable input stream containing one or more XZ Streams; the whole input stream is usedarrayCache
- cache to be used for allocating large arrays- Throws:
XZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify options not supported by this implementationEOFException
- less than 6 bytes of input was available fromin
, or (unlikely) the size of the underlying stream got smaller while this was reading from itIOException
- may be thrown byin
- Since:
- 1.7
-
SeekableXZInputStream
Creates a new seekable XZ decomporessor with an optional memory usage limit.- Parameters:
in
- seekable input stream containing one or more XZ Streams; the whole input stream is usedmemoryLimit
- memory usage limit in kibibytes (KiB) or-1
to impose no memory usage limit- Throws:
XZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify options not supported by this implementationMemoryLimitException
- decoded XZ Indexes would need more memory than allowed by the memory usage limitEOFException
- less than 6 bytes of input was available fromin
, or (unlikely) the size of the underlying stream got smaller while this was reading from itIOException
- may be thrown byin
-
SeekableXZInputStream
public SeekableXZInputStream(SeekableInputStream in, int memoryLimit, ArrayCache arrayCache) throws IOExceptionCreates a new seekable XZ decomporessor with an optional memory usage limit.This is identical to
SeekableXZInputStream(SeekableInputStream,int)
except that this also takes thearrayCache
argument.- Parameters:
in
- seekable input stream containing one or more XZ Streams; the whole input stream is usedmemoryLimit
- memory usage limit in kibibytes (KiB) or-1
to impose no memory usage limitarrayCache
- cache to be used for allocating large arrays- Throws:
XZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify options not supported by this implementationMemoryLimitException
- decoded XZ Indexes would need more memory than allowed by the memory usage limitEOFException
- less than 6 bytes of input was available fromin
, or (unlikely) the size of the underlying stream got smaller while this was reading from itIOException
- may be thrown byin
- Since:
- 1.7
-
SeekableXZInputStream
public SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck) throws IOExceptionCreates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.Note that integrity check verification should almost never be disabled. Possible reasons to disable integrity check verification:
- Trying to recover data from a corrupt .xz file.
- Speeding up decompression. This matters mostly with SHA-256 or with files that have compressed extremely well. It's recommended that integrity checking isn't disabled for performance reasons unless the file integrity is verified externally in some other way.
verifyCheck
only affects the integrity check of the actual compressed data. The CRC32 fields in the headers are always verified.- Parameters:
in
- seekable input stream containing one or more XZ Streams; the whole input stream is usedmemoryLimit
- memory usage limit in kibibytes (KiB) or-1
to impose no memory usage limitverifyCheck
- iftrue
, the integrity checks will be verified; this should almost never be set tofalse
- Throws:
XZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify options not supported by this implementationMemoryLimitException
- decoded XZ Indexes would need more memory than allowed by the memory usage limitEOFException
- less than 6 bytes of input was available fromin
, or (unlikely) the size of the underlying stream got smaller while this was reading from itIOException
- may be thrown byin
- Since:
- 1.6
-
SeekableXZInputStream
public SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck, ArrayCache arrayCache) throws IOExceptionCreates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.This is identical to
SeekableXZInputStream(SeekableInputStream,int,boolean)
except that this also takes thearrayCache
argument.- Parameters:
in
- seekable input stream containing one or more XZ Streams; the whole input stream is usedmemoryLimit
- memory usage limit in kibibytes (KiB) or-1
to impose no memory usage limitverifyCheck
- iftrue
, the integrity checks will be verified; this should almost never be set tofalse
arrayCache
- cache to be used for allocating large arrays- Throws:
XZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify options not supported by this implementationMemoryLimitException
- decoded XZ Indexes would need more memory than allowed by the memory usage limitEOFException
- less than 6 bytes of input was available fromin
, or (unlikely) the size of the underlying stream got smaller while this was reading from itIOException
- may be thrown byin
- Since:
- 1.7
-
-
Method Details
-
getCheckTypes
public int getCheckTypes()Gets the types of integrity checks used in the .xz file. Multiple checks are possible only if there are multiple concatenated XZ Streams.The returned value has a bit set for every check type that is present. For example, if CRC64 and SHA-256 were used, the return value is
(1 << XZ.CHECK_CRC64) | (1 << XZ.CHECK_SHA256)
. -
getIndexMemoryUsage
public int getIndexMemoryUsage()Gets the amount of memory in kibibytes (KiB) used by the data structures needed to locate the XZ Blocks. This is usually useless information but since it is calculated for memory usage limit anyway, it is nice to make it available to too. -
getLargestBlockSize
public long getLargestBlockSize()Gets the uncompressed size of the largest XZ Block in bytes. This can be useful if you want to check that the file doesn't have huge XZ Blocks which could make seeking to arbitrary offsets very slow. Note that huge Blocks don't automatically mean that seeking would be slow, for example, seeking to the beginning of any Block is always fast. -
getStreamCount
public int getStreamCount()Gets the number of Streams in the .xz file.- Since:
- 1.3
-
getBlockCount
public int getBlockCount()Gets the number of Blocks in the .xz file.- Since:
- 1.3
-
getBlockPos
public long getBlockPos(int blockNumber)Gets the uncompressed start position of the given Block.- Throws:
IndexOutOfBoundsException
- ifblockNumber < 0
orblockNumber >= getBlockCount()
.- Since:
- 1.3
-
getBlockSize
public long getBlockSize(int blockNumber)Gets the uncompressed size of the given Block.- Throws:
IndexOutOfBoundsException
- ifblockNumber < 0
orblockNumber >= getBlockCount()
.- Since:
- 1.3
-
getBlockCompPos
public long getBlockCompPos(int blockNumber)Gets the position where the given compressed Block starts in the underlying .xz file. This information is rarely useful to the users of this class.- Throws:
IndexOutOfBoundsException
- ifblockNumber < 0
orblockNumber >= getBlockCount()
.- Since:
- 1.3
-
getBlockCompSize
public long getBlockCompSize(int blockNumber)Gets the compressed size of the given Block. This together with the uncompressed size can be used to calculate the compression ratio of the specific Block.- Throws:
IndexOutOfBoundsException
- ifblockNumber < 0
orblockNumber >= getBlockCount()
.- Since:
- 1.3
-
getBlockCheckType
public int getBlockCheckType(int blockNumber)Gets integrity check type (Check ID) of the given Block.- Throws:
IndexOutOfBoundsException
- ifblockNumber < 0
orblockNumber >= getBlockCount()
.- Since:
- 1.3
- See Also:
getCheckTypes()
-
getBlockNumber
public int getBlockNumber(long pos)Gets the number of the Block that contains the byte at the given uncompressed position.- Throws:
IndexOutOfBoundsException
- ifpos < 0
orpos >= length()
.- Since:
- 1.3
-
read
Decompresses the next byte from this input stream.- Specified by:
read
in classInputStream
- Returns:
- the next decompressed byte, or
-1
to indicate the end of the compressed stream - Throws:
CorruptedInputException
UnsupportedOptionsException
MemoryLimitException
XZIOException
- if the stream has been closedIOException
- may be thrown byin
-
read
Decompresses into an array of bytes.If
len
is zero, no bytes are read and0
is returned. Otherwise this will try to decompresslen
bytes of uncompressed data. Less thanlen
bytes may be read only in the following situations:- The end of the compressed data was reached successfully.
- An error is detected after at least one but less than
len
bytes have already been successfully decompressed. The next call with non-zerolen
will immediately throw the pending exception. - An exception is thrown.
- Overrides:
read
in classInputStream
- Parameters:
buf
- target buffer for uncompressed dataoff
- start offset inbuf
len
- maximum number of uncompressed bytes to read- Returns:
- number of bytes read, or
-1
to indicate the end of the compressed stream - Throws:
CorruptedInputException
UnsupportedOptionsException
MemoryLimitException
XZIOException
- if the stream has been closedIOException
- may be thrown byin
-
available
Returns the number of uncompressed bytes that can be read without blocking. The value is returned with an assumption that the compressed input data will be valid. If the compressed data is corrupt,CorruptedInputException
may get thrown before the number of bytes claimed to be available have been read from this input stream.- Overrides:
available
in classInputStream
- Returns:
- the number of uncompressed bytes that can be read without blocking
- Throws:
IOException
-
close
Closes the stream and callsin.close()
. If the stream was already closed, this does nothing.This is equivalent to
close(true)
.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classInputStream
- Throws:
IOException
- if thrown byin.close()
-
close
Closes the stream and optionally callsin.close()
. If the stream was already closed, this does nothing. Ifclose(false)
has been called, a further call ofclose(true)
does nothing (it doesn't callin.close()
).If you don't want to close the underlying
InputStream
, there is usually no need to worry about closing this stream either; it's fine to do nothing and let the garbage collector handle it. However, if you are usingArrayCache
,close(false)
can be useful to put the allocated arrays back to the cache without closing the underlyingInputStream
.Note that if you successfully reach the end of the stream (
read
returns-1
), the arrays are automatically put back to the cache by thatread
call. In this situationclose(false)
is redundant (but harmless).- Throws:
IOException
- if thrown byin.close()
- Since:
- 1.7
-
length
public long length()Gets the uncompressed size of this input stream. If there are multiple XZ Streams, the total uncompressed size of all XZ Streams is returned.- Specified by:
length
in classSeekableInputStream
-
position
Gets the current uncompressed position in this input stream.- Specified by:
position
in classSeekableInputStream
- Throws:
XZIOException
- if the stream has been closedIOException
-
seek
Seeks to the specified absolute uncompressed position in the stream. This only stores the new position, so this function itself is always very fast. The actual seek is done whenread
is called to read at least one byte.Seeking past the end of the stream is possible. In that case
read
will return-1
to indicate the end of the stream.- Specified by:
seek
in classSeekableInputStream
- Parameters:
pos
- new uncompressed read position- Throws:
XZIOException
- ifpos
is negative, or if stream has been closedIOException
- ifpos
is negative or if a stream-specific I/O error occurs
-
seekToBlock
Seeks to the beginning of the given XZ Block.- Throws:
XZIOException
- ifblockNumber < 0
orblockNumber >= getBlockCount()
, or if stream has been closedIOException
- Since:
- 1.3
-