org.tukaani.xz.SeekableXZInputStream

All Implemented Interfaces:: Closeable, AutoCloseable

public class SeekableXZInputStream
extends SeekableInputStream

Decompresses a .xz file in random access mode. This supports decompressing concatenated .xz files.

Each .xz file consist of one or more Streams. Each Stream consist of zero or more Blocks. Each Stream contains an Index of Streams' Blocks. The Indexes from all Streams are loaded in RAM by a constructor of this class. A typical .xz file has only one Stream, and parsing its Index will need only three or four seeks.

To make random access possible, the data in a .xz file must be splitted into multiple Blocks of reasonable size. Decompression can only start at a Block boundary. When seeking to an uncompressed position that is not at a Block boundary, decompression starts at the beginning of the Block and throws away data until the target position is reached. Thus, smaller Blocks mean faster seeks to arbitrary uncompressed positions. On the other hand, smaller Blocks mean worse compression. So one has to make a compromise between random access speed and compression ratio.

Implementation note: This class uses linear search to locate the correct Stream from the data structures in RAM. It was the simplest to implement and should be fine as long as there aren't too many Streams. The correct Block inside a Stream is located using binary search and thus is fast even with a huge number of Blocks.

Memory usage

The amount of memory needed for the Indexes is taken into account when checking the memory usage limit. Each Stream is calculated to need at least 1 KiB of memory and each Block 16 bytes of memory, rounded up to the next kibibyte. So unless the file has a huge number of Streams or Blocks, these don't take significant amount of memory.

Creating random-accessible .xz files

When using XZOutputStream, a new Block can be started by calling its endBlock method. If you know that the decompressor will only need to seek to certain uncompressed positions, it can be a good idea to start a new Block at (some of) these positions (and only at these positions to get better compression ratio).

liblzma in XZ Utils supports starting a new Block with LZMA_FULL_FLUSH. XZ Utils 5.1.1alpha added threaded compression which creates multi-Block .xz files. XZ Utils 5.1.1alpha also added the option --block-size=SIZE to the xz command line tool. XZ Utils 5.1.2alpha added a partial implementation of --block-list=SIZES which allows specifying sizes of individual Blocks.

Example: getting the uncompressed size of a .xz file

 String filename = "foo.xz";
 SeekableFileInputStream seekableFile
         = new SeekableFileInputStream(filename);

 try {
     SeekableXZInputStream seekableXZ
             = new SeekableXZInputStream(seekableFile);
     System.out.println("Uncompressed size: " + seekableXZ.length());
 } finally {
     seekableFile.close();
 }

See Also:: SeekableFileInputStream, XZInputStream, XZOutputStream

Constructor Summary

Constructors
Constructor	Description
`SeekableXZInputStream(SeekableInputStream in)`	Creates a new seekable XZ decompressor without a memory usage limit.
`SeekableXZInputStream(SeekableInputStream in, int memoryLimit)`	Creates a new seekable XZ decomporessor with an optional memory usage limit.
`SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck)`	Creates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.
`SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck, ArrayCache arrayCache)`	Creates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.
`SeekableXZInputStream(SeekableInputStream in, int memoryLimit, ArrayCache arrayCache)`	Creates a new seekable XZ decomporessor with an optional memory usage limit.
`SeekableXZInputStream(SeekableInputStream in, ArrayCache arrayCache)`	Creates a new seekable XZ decompressor without a memory usage limit.

Method Summary

Modifier and Type	Method	Description
`int`	`available()`	Returns the number of uncompressed bytes that can be read without blocking.
`void`	`close()`	Closes the stream and calls `in.close()`.
`void`	`close(boolean closeInput)`	Closes the stream and optionally calls `in.close()`.
`int`	`getBlockCheckType(int blockNumber)`	Gets integrity check type (Check ID) of the given Block.
`long`	`getBlockCompPos(int blockNumber)`	Gets the position where the given compressed Block starts in the underlying .xz file.
`long`	`getBlockCompSize(int blockNumber)`	Gets the compressed size of the given Block.
`int`	`getBlockCount()`	Gets the number of Blocks in the .xz file.
`int`	`getBlockNumber(long pos)`	Gets the number of the Block that contains the byte at the given uncompressed position.
`long`	`getBlockPos(int blockNumber)`	Gets the uncompressed start position of the given Block.
`long`	`getBlockSize(int blockNumber)`	Gets the uncompressed size of the given Block.
`int`	`getCheckTypes()`	Gets the types of integrity checks used in the .xz file.
`int`	`getIndexMemoryUsage()`	Gets the amount of memory in kibibytes (KiB) used by the data structures needed to locate the XZ Blocks.
`long`	`getLargestBlockSize()`	Gets the uncompressed size of the largest XZ Block in bytes.
`int`	`getStreamCount()`	Gets the number of Streams in the .xz file.
`long`	`length()`	Gets the uncompressed size of this input stream.
`long`	`position()`	Gets the current uncompressed position in this input stream.
`int`	`read()`	Decompresses the next byte from this input stream.
`int`	`read(byte[] buf, int off, int len)`	Decompresses into an array of bytes.
`void`	`seek(long pos)`	Seeks to the specified absolute uncompressed position in the stream.
`void`	`seekToBlock(int blockNumber)`	Seeks to the beginning of the given XZ Block.

Methods inherited from class org.tukaani.xz.SeekableInputStream

skip

Methods inherited from class java.io.InputStream

mark, markSupported, nullInputStream, read, readAllBytes, readNBytes, readNBytes, reset, skipNBytes, transferTo

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SeekableXZInputStream
  
  public SeekableXZInputStream(SeekableInputStream in) throws IOException
  
  Creates a new seekable XZ decompressor without a memory usage limit.
  
  Parameters:
  
  in - seekable input stream containing one or more XZ Streams; the whole input stream is used
  
  Throws:
  
  XZFormatException - input is not in the XZ format
  
  CorruptedInputException - XZ data is corrupt or truncated
  
  UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
  
  EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
  
  IOException - may be thrown by in
- SeekableXZInputStream
  
  public SeekableXZInputStream(SeekableInputStream in, ArrayCache arrayCache) throws IOException
  
  Creates a new seekable XZ decompressor without a memory usage limit.
  This is identical to SeekableXZInputStream(SeekableInputStream) except that this also takes the arrayCache argument.
  
  Parameters:
  
  in - seekable input stream containing one or more XZ Streams; the whole input stream is used
  
  arrayCache - cache to be used for allocating large arrays
  
  Throws:
  
  XZFormatException - input is not in the XZ format
  
  CorruptedInputException - XZ data is corrupt or truncated
  
  UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
  
  EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
  
  IOException - may be thrown by in
  
  Since:
  
  1.7
- SeekableXZInputStream
  
  public SeekableXZInputStream(SeekableInputStream in, int memoryLimit) throws IOException
  
  Creates a new seekable XZ decomporessor with an optional memory usage limit.
  
  Parameters:
  
  in - seekable input stream containing one or more XZ Streams; the whole input stream is used
  
  memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
  
  Throws:
  
  XZFormatException - input is not in the XZ format
  
  CorruptedInputException - XZ data is corrupt or truncated
  
  UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
  
  MemoryLimitException - decoded XZ Indexes would need more memory than allowed by the memory usage limit
  
  EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
  
  IOException - may be thrown by in
- SeekableXZInputStream
  
  public SeekableXZInputStream(SeekableInputStream in, int memoryLimit, ArrayCache arrayCache) throws IOException
  
  Creates a new seekable XZ decomporessor with an optional memory usage limit.
  This is identical to SeekableXZInputStream(SeekableInputStream,int) except that this also takes the arrayCache argument.
  
  Parameters:
  
  in - seekable input stream containing one or more XZ Streams; the whole input stream is used
  
  memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
  
  arrayCache - cache to be used for allocating large arrays
  
  Throws:
  
  XZFormatException - input is not in the XZ format
  
  CorruptedInputException - XZ data is corrupt or truncated
  
  UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
  
  MemoryLimitException - decoded XZ Indexes would need more memory than allowed by the memory usage limit
  
  EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
  
  IOException - may be thrown by in
  
  Since:
  
  1.7
- SeekableXZInputStream
  
  public SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck) throws IOException
  Creates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.
  Note that integrity check verification should almost never be disabled. Possible reasons to disable integrity check verification:
  
  Trying to recover data from a corrupt .xz file.
  
  Speeding up decompression. This matters mostly with SHA-256 or with files that have compressed extremely well. It's recommended that integrity checking isn't disabled for performance reasons unless the file integrity is verified externally in some other way.
  
  verifyCheck only affects the integrity check of the actual compressed data. The CRC32 fields in the headers are always verified.
  Parameters:
  
  in - seekable input stream containing one or more XZ Streams; the whole input stream is used
  
  memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
  
  verifyCheck - if true, the integrity checks will be verified; this should almost never be set to false
  
  Throws:
  
  XZFormatException - input is not in the XZ format
  
  CorruptedInputException - XZ data is corrupt or truncated
  
  UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
  
  MemoryLimitException - decoded XZ Indexes would need more memory than allowed by the memory usage limit
  
  EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
  
  IOException - may be thrown by in
  
  Since:
  
  1.6
- SeekableXZInputStream
  
  public SeekableXZInputStream(SeekableInputStream in, int memoryLimit, boolean verifyCheck, ArrayCache arrayCache) throws IOException
  
  Creates a new seekable XZ decomporessor with an optional memory usage limit and ability to disable verification of integrity checks.
  This is identical to SeekableXZInputStream(SeekableInputStream,int,boolean) except that this also takes the arrayCache argument.
  
  Parameters:
  
  in - seekable input stream containing one or more XZ Streams; the whole input stream is used
  
  memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
  
  verifyCheck - if true, the integrity checks will be verified; this should almost never be set to false
  
  arrayCache - cache to be used for allocating large arrays
  
  Throws:
  
  XZFormatException - input is not in the XZ format
  
  CorruptedInputException - XZ data is corrupt or truncated
  
  UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
  
  MemoryLimitException - decoded XZ Indexes would need more memory than allowed by the memory usage limit
  
  EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
  
  IOException - may be thrown by in
  
  Since:
  
  1.7
Method Details
- getCheckTypes
  
  public int getCheckTypes()
  
  Gets the types of integrity checks used in the .xz file. Multiple checks are possible only if there are multiple concatenated XZ Streams.
  The returned value has a bit set for every check type that is present. For example, if CRC64 and SHA-256 were used, the return value is (1 << XZ.CHECK_CRC64) | (1 << XZ.CHECK_SHA256).
- getIndexMemoryUsage
  
  public int getIndexMemoryUsage()
  
  Gets the amount of memory in kibibytes (KiB) used by the data structures needed to locate the XZ Blocks. This is usually useless information but since it is calculated for memory usage limit anyway, it is nice to make it available to too.
- getLargestBlockSize
  
  public long getLargestBlockSize()
  
  Gets the uncompressed size of the largest XZ Block in bytes. This can be useful if you want to check that the file doesn't have huge XZ Blocks which could make seeking to arbitrary offsets very slow. Note that huge Blocks don't automatically mean that seeking would be slow, for example, seeking to the beginning of any Block is always fast.
- getStreamCount
  
  public int getStreamCount()
  
  Gets the number of Streams in the .xz file.
  
  Since:
  
  1.3
- getBlockCount
  
  public int getBlockCount()
  
  Gets the number of Blocks in the .xz file.
  
  Since:
  
  1.3
- getBlockPos
  
  public long getBlockPos(int blockNumber)
  
  Gets the uncompressed start position of the given Block.
  
  Throws:
  
  IndexOutOfBoundsException - if blockNumber < 0 or blockNumber >= getBlockCount().
  
  Since:
  
  1.3
- getBlockSize
  
  public long getBlockSize(int blockNumber)
  
  Gets the uncompressed size of the given Block.
  
  Throws:
  
  IndexOutOfBoundsException - if blockNumber < 0 or blockNumber >= getBlockCount().
  
  Since:
  
  1.3
- getBlockCompPos
  
  public long getBlockCompPos(int blockNumber)
  
  Gets the position where the given compressed Block starts in the underlying .xz file. This information is rarely useful to the users of this class.
  
  Throws:
  
  IndexOutOfBoundsException - if blockNumber < 0 or blockNumber >= getBlockCount().
  
  Since:
  
  1.3
- getBlockCompSize
  
  public long getBlockCompSize(int blockNumber)
  
  Gets the compressed size of the given Block. This together with the uncompressed size can be used to calculate the compression ratio of the specific Block.
  
  Throws:
  
  IndexOutOfBoundsException - if blockNumber < 0 or blockNumber >= getBlockCount().
  
  Since:
  
  1.3
- getBlockCheckType
  
  public int getBlockCheckType(int blockNumber)
  
  Gets integrity check type (Check ID) of the given Block.
  
  Throws:
  
  IndexOutOfBoundsException - if blockNumber < 0 or blockNumber >= getBlockCount().
  
  Since:
  
  1.3
  
  See Also:
  
  getCheckTypes()
- getBlockNumber
  
  public int getBlockNumber(long pos)
  
  Gets the number of the Block that contains the byte at the given uncompressed position.
  
  Throws:
  
  IndexOutOfBoundsException - if pos < 0 or pos >= length().
  
  Since:
  
  1.3
- read
  
  public int read() throws IOException
  
  Decompresses the next byte from this input stream.
  
  Specified by:
  
  read in class InputStream
  
  Returns:
  
  the next decompressed byte, or -1 to indicate the end of the compressed stream
  
  Throws:
  
  CorruptedInputException
  
  UnsupportedOptionsException
  
  MemoryLimitException
  
  XZIOException - if the stream has been closed
  
  IOException - may be thrown by in
- read
  
  public int read(byte[] buf, int off, int len) throws IOException
  Decompresses into an array of bytes.
  If len is zero, no bytes are read and 0 is returned. Otherwise this will try to decompress len bytes of uncompressed data. Less than len bytes may be read only in the following situations:
  
  The end of the compressed data was reached successfully.
  
  An error is detected after at least one but less than len bytes have already been successfully decompressed. The next call with non-zero len will immediately throw the pending exception.
  
  An exception is thrown.
  Overrides:
  
  read in class InputStream
  
  Parameters:
  
  buf - target buffer for uncompressed data
  
  off - start offset in buf
  
  len - maximum number of uncompressed bytes to read
  
  Returns:
  
  number of bytes read, or -1 to indicate the end of the compressed stream
  
  Throws:
  
  CorruptedInputException
  
  UnsupportedOptionsException
  
  MemoryLimitException
  
  XZIOException - if the stream has been closed
  
  IOException - may be thrown by in
- available
  
  public int available() throws IOException
  
  Returns the number of uncompressed bytes that can be read without blocking. The value is returned with an assumption that the compressed input data will be valid. If the compressed data is corrupt, CorruptedInputException may get thrown before the number of bytes claimed to be available have been read from this input stream.
  
  Overrides:
  
  available in class InputStream
  
  Returns:
  
  the number of uncompressed bytes that can be read without blocking
  
  Throws:
  
  IOException
- close
  
  public void close() throws IOException
  
  Closes the stream and calls in.close(). If the stream was already closed, this does nothing.
  This is equivalent to close(true).
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in interface Closeable
  
  Overrides:
  
  close in class InputStream
  
  Throws:
  
  IOException - if thrown by in.close()
- close
  
  public void close(boolean closeInput) throws IOException
  
  Closes the stream and optionally calls in.close(). If the stream was already closed, this does nothing. If close(false) has been called, a further call of close(true) does nothing (it doesn't call in.close()).
  If you don't want to close the underlying InputStream, there is usually no need to worry about closing this stream either; it's fine to do nothing and let the garbage collector handle it. However, if you are using ArrayCache, close(false) can be useful to put the allocated arrays back to the cache without closing the underlying InputStream.
  Note that if you successfully reach the end of the stream (read returns -1), the arrays are automatically put back to the cache by that read call. In this situation close(false) is redundant (but harmless).
  
  Throws:
  
  IOException - if thrown by in.close()
  
  Since:
  
  1.7
- length
  
  public long length()
  
  Gets the uncompressed size of this input stream. If there are multiple XZ Streams, the total uncompressed size of all XZ Streams is returned.
  
  Specified by:
  
  length in class SeekableInputStream
- position
  
  public long position() throws IOException
  
  Gets the current uncompressed position in this input stream.
  
  Specified by:
  
  position in class SeekableInputStream
  
  Throws:
  
  XZIOException - if the stream has been closed
  
  IOException
- seek
  
  public void seek(long pos) throws IOException
  
  Seeks to the specified absolute uncompressed position in the stream. This only stores the new position, so this function itself is always very fast. The actual seek is done when read is called to read at least one byte.
  Seeking past the end of the stream is possible. In that case read will return -1 to indicate the end of the stream.
  
  Specified by:
  
  seek in class SeekableInputStream
  
  Parameters:
  
  pos - new uncompressed read position
  
  Throws:
  
  XZIOException - if pos is negative, or if stream has been closed
  
  IOException - if pos is negative or if a stream-specific I/O error occurs
- seekToBlock
  
  public void seekToBlock(int blockNumber) throws IOException
  
  Seeks to the beginning of the given XZ Block.
  
  Throws:
  
  XZIOException - if blockNumber < 0 or blockNumber >= getBlockCount(), or if stream has been closed
  
  IOException
  
  Since:
  
  1.3

Class SeekableXZInputStream

Memory usage

Creating random-accessible .xz files

Example: getting the uncompressed size of a .xz file

Constructor Summary

Method Summary

Methods inherited from class org.tukaani.xz.SeekableInputStream

Methods inherited from class java.io.InputStream

Methods inherited from class java.lang.Object

Constructor Details

SeekableXZInputStream

SeekableXZInputStream

SeekableXZInputStream

SeekableXZInputStream

SeekableXZInputStream

SeekableXZInputStream

Method Details

getCheckTypes

getIndexMemoryUsage

getLargestBlockSize

getStreamCount

getBlockCount

getBlockPos

getBlockSize

getBlockCompPos

getBlockCompSize

getBlockCheckType

getBlockNumber

read

read

available

close

close

length

position

seek

seekToBlock