Class LZMAInputStream

java.lang.Object
java.io.InputStream
org.tukaani.xz.LZMAInputStream
All Implemented Interfaces:
Closeable, AutoCloseable

public class LZMAInputStream
extends InputStream
Decompresses legacy .lzma files and raw LZMA streams (no .lzma header).

IMPORTANT: In contrast to other classes in this package, this class reads data from its input stream one byte at a time. If the input stream is for example FileInputStream, wrapping it into BufferedInputStream tends to improve performance a lot. This is not automatically done by this class because there may be use cases where it is desired that this class won't read any bytes past the end of the LZMA stream.

Even when using BufferedInputStream, the performance tends to be worse (maybe 10-20 % slower) than with LZMA2InputStream or XZInputStream (when the .xz file contains LZMA2-compressed data).

Since:
1.4
  • Field Summary

    Fields
    Modifier and Type Field Description
    static int DICT_SIZE_MAX
    Largest dictionary size supported by this implementation.
  • Constructor Summary

    Constructors
    Constructor Description
    LZMAInputStream​(InputStream in)
    Creates a new .lzma file format decompressor without a memory usage limit.
    LZMAInputStream​(InputStream in, int memoryLimit)
    Creates a new .lzma file format decompressor with an optional memory usage limit.
    LZMAInputStream​(InputStream in, int memoryLimit, ArrayCache arrayCache)
    Creates a new .lzma file format decompressor with an optional memory usage limit.
    LZMAInputStream​(InputStream in, long uncompSize, byte propsByte, int dictSize)
    Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in.
    LZMAInputStream​(InputStream in, long uncompSize, byte propsByte, int dictSize, byte[] presetDict)
    Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.
    LZMAInputStream​(InputStream in, long uncompSize, byte propsByte, int dictSize, byte[] presetDict, ArrayCache arrayCache)
    Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.
    LZMAInputStream​(InputStream in, long uncompSize, int lc, int lp, int pb, int dictSize, byte[] presetDict)
    Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.
    LZMAInputStream​(InputStream in, long uncompSize, int lc, int lp, int pb, int dictSize, byte[] presetDict, ArrayCache arrayCache)
    Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.
    LZMAInputStream​(InputStream in, ArrayCache arrayCache)
    Creates a new .lzma file format decompressor without a memory usage limit.
  • Method Summary

    Modifier and Type Method Description
    void close()
    Closes the stream and calls in.close().
    void enableRelaxedEndCondition()
    Enables relaxed end-of-stream condition when uncompressed size is known.
    static int getMemoryUsage​(int dictSize, byte propsByte)
    Gets approximate decompressor memory requirements as kibibytes for the given dictionary size and LZMA properties byte (lc, lp, and pb).
    static int getMemoryUsage​(int dictSize, int lc, int lp)
    Gets approximate decompressor memory requirements as kibibytes for the given dictionary size, lc, and lp.
    int read()
    Decompresses the next byte from this input stream.
    int read​(byte[] buf, int off, int len)
    Decompresses into an array of bytes.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DICT_SIZE_MAX

      public static final int DICT_SIZE_MAX
      Largest dictionary size supported by this implementation.

      LZMA allows dictionaries up to one byte less than 4 GiB. This implementation supports only 16 bytes less than 2 GiB. This limitation is due to Java using signed 32-bit integers for array indexing. The limitation shouldn't matter much in practice since so huge dictionaries are not normally used.

      See Also:
      Constant Field Values
  • Constructor Details

    • LZMAInputStream

      public LZMAInputStream​(InputStream in) throws IOException
      Creates a new .lzma file format decompressor without a memory usage limit.
      Parameters:
      in - input stream from which .lzma data is read; it might be a good idea to wrap it in BufferedInputStream, see the note at the top of this page
      Throws:
      CorruptedInputException - file is corrupt or perhaps not in the .lzma format at all
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      EOFException - file is truncated or perhaps not in the .lzma format at all
      IOException - may be thrown by in
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, ArrayCache arrayCache) throws IOException
      Creates a new .lzma file format decompressor without a memory usage limit.

      This is identical to LZMAInputStream(InputStream) except that this also takes the arrayCache argument.

      Parameters:
      in - input stream from which .lzma data is read; it might be a good idea to wrap it in BufferedInputStream, see the note at the top of this page
      arrayCache - cache to be used for allocating large arrays
      Throws:
      CorruptedInputException - file is corrupt or perhaps not in the .lzma format at all
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      EOFException - file is truncated or perhaps not in the .lzma format at all
      IOException - may be thrown by in
      Since:
      1.7
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, int memoryLimit) throws IOException
      Creates a new .lzma file format decompressor with an optional memory usage limit.
      Parameters:
      in - input stream from which .lzma data is read; it might be a good idea to wrap it in BufferedInputStream, see the note at the top of this page
      memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
      Throws:
      CorruptedInputException - file is corrupt or perhaps not in the .lzma format at all
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      MemoryLimitException - memory usage limit was exceeded
      EOFException - file is truncated or perhaps not in the .lzma format at all
      IOException - may be thrown by in
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, int memoryLimit, ArrayCache arrayCache) throws IOException
      Creates a new .lzma file format decompressor with an optional memory usage limit.

      This is identical to LZMAInputStream(InputStream, int) except that this also takes the arrayCache argument.

      Parameters:
      in - input stream from which .lzma data is read; it might be a good idea to wrap it in BufferedInputStream, see the note at the top of this page
      memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
      arrayCache - cache to be used for allocating large arrays
      Throws:
      CorruptedInputException - file is corrupt or perhaps not in the .lzma format at all
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      MemoryLimitException - memory usage limit was exceeded
      EOFException - file is truncated or perhaps not in the .lzma format at all
      IOException - may be thrown by in
      Since:
      1.7
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, long uncompSize, byte propsByte, int dictSize) throws IOException
      Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in.

      The caller needs to know if the "end of payload marker (EOPM)" alias "end of stream marker (EOS marker)" alias "end marker" present. If the end marker isn't used, the caller must know the exact uncompressed size of the stream.

      The caller also needs to provide the LZMA properties byte that encodes the number of literal context bits (lc), literal position bits (lp), and position bits (pb).

      The dictionary size used when compressing is also needed. Specifying a too small dictionary size will prevent decompressing the stream. Specifying a too big dictionary is waste of memory but decompression will work.

      There is no need to specify a dictionary bigger than the uncompressed size of the data even if a bigger dictionary was used when compressing. If you know the uncompressed size of the data, this might allow saving some memory.

      Parameters:
      in - input stream from which compressed data is read
      uncompSize - uncompressed size of the LZMA stream or -1 if the end marker is used in the LZMA stream
      propsByte - LZMA properties byte that has the encoded values for literal context bits (lc), literal position bits (lp), and position bits (pb)
      dictSize - dictionary size as bytes, must be in the range [0, DICT_SIZE_MAX]
      Throws:
      CorruptedInputException - if propsByte is invalid or the first input byte is not 0x00
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      IOException
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, long uncompSize, byte propsByte, int dictSize, byte[] presetDict) throws IOException
      Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.
      Parameters:
      in - input stream from which LZMA-compressed data is read
      uncompSize - uncompressed size of the LZMA stream or -1 if the end marker is used in the LZMA stream
      propsByte - LZMA properties byte that has the encoded values for literal context bits (lc), literal position bits (lp), and position bits (pb)
      dictSize - dictionary size as bytes, must be in the range [0, DICT_SIZE_MAX]
      presetDict - preset dictionary or null to use no preset dictionary
      Throws:
      CorruptedInputException - if propsByte is invalid or the first input byte is not 0x00
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      EOFException - file is truncated or corrupt
      IOException - may be thrown by in
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, long uncompSize, byte propsByte, int dictSize, byte[] presetDict, ArrayCache arrayCache) throws IOException
      Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.

      This is identical to LZMAInputStream(InputStream, long, byte, int, byte[]) except that this also takes the arrayCache argument.

      Parameters:
      in - input stream from which LZMA-compressed data is read
      uncompSize - uncompressed size of the LZMA stream or -1 if the end marker is used in the LZMA stream
      propsByte - LZMA properties byte that has the encoded values for literal context bits (lc), literal position bits (lp), and position bits (pb)
      dictSize - dictionary size as bytes, must be in the range [0, DICT_SIZE_MAX]
      presetDict - preset dictionary or null to use no preset dictionary
      arrayCache - cache to be used for allocating large arrays
      Throws:
      CorruptedInputException - if propsByte is invalid or the first input byte is not 0x00
      UnsupportedOptionsException - dictionary size or uncompressed size is too big for this implementation
      EOFException - file is truncated or corrupt
      IOException - may be thrown by in
      Since:
      1.7
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, long uncompSize, int lc, int lp, int pb, int dictSize, byte[] presetDict) throws IOException
      Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.
      Parameters:
      in - input stream from which LZMA-compressed data is read
      uncompSize - uncompressed size of the LZMA stream or -1 if the end marker is used in the LZMA stream
      lc - number of literal context bits, must be in the range [0, 8]
      lp - number of literal position bits, must be in the range [0, 4]
      pb - number position bits, must be in the range [0, 4]
      dictSize - dictionary size as bytes, must be in the range [0, DICT_SIZE_MAX]
      presetDict - preset dictionary or null to use no preset dictionary
      Throws:
      CorruptedInputException - if the first input byte is not 0x00
      EOFException - file is truncated or corrupt
      IOException - may be thrown by in
    • LZMAInputStream

      public LZMAInputStream​(InputStream in, long uncompSize, int lc, int lp, int pb, int dictSize, byte[] presetDict, ArrayCache arrayCache) throws IOException
      Creates a new input stream that decompresses raw LZMA data (no .lzma header) from in optionally with a preset dictionary.

      This is identical to LZMAInputStream(InputStream, long, int, int, int, int, byte[]) except that this also takes the arrayCache argument.

      Parameters:
      in - input stream from which LZMA-compressed data is read
      uncompSize - uncompressed size of the LZMA stream or -1 if the end marker is used in the LZMA stream
      lc - number of literal context bits, must be in the range [0, 8]
      lp - number of literal position bits, must be in the range [0, 4]
      pb - number position bits, must be in the range [0, 4]
      dictSize - dictionary size as bytes, must be in the range [0, DICT_SIZE_MAX]
      presetDict - preset dictionary or null to use no preset dictionary
      arrayCache - cache to be used for allocating large arrays
      Throws:
      CorruptedInputException - if the first input byte is not 0x00
      EOFException - file is truncated or corrupt
      IOException - may be thrown by in
      Since:
      1.7
  • Method Details

    • getMemoryUsage

      public static int getMemoryUsage​(int dictSize, byte propsByte) throws UnsupportedOptionsException, CorruptedInputException
      Gets approximate decompressor memory requirements as kibibytes for the given dictionary size and LZMA properties byte (lc, lp, and pb).
      Parameters:
      dictSize - LZMA dictionary size as bytes, should be in the range [0, DICT_SIZE_MAX]
      propsByte - LZMA properties byte that encodes the values of lc, lp, and pb
      Returns:
      approximate memory requirements as kibibytes (KiB)
      Throws:
      UnsupportedOptionsException - if dictSize is outside the range [0, DICT_SIZE_MAX]
      CorruptedInputException - if propsByte is invalid
    • getMemoryUsage

      public static int getMemoryUsage​(int dictSize, int lc, int lp)
      Gets approximate decompressor memory requirements as kibibytes for the given dictionary size, lc, and lp. Note that pb isn't needed.
      Parameters:
      dictSize - LZMA dictionary size as bytes, must be in the range [0, DICT_SIZE_MAX]
      lc - number of literal context bits, must be in the range [0, 8]
      lp - number of literal position bits, must be in the range [0, 4]
      Returns:
      approximate memory requirements as kibibytes (KiB)
    • enableRelaxedEndCondition

      public void enableRelaxedEndCondition()
      Enables relaxed end-of-stream condition when uncompressed size is known. This is useful if uncompressed size is known but it is unknown if the end of stream (EOS) marker is present. After calling this function, both are allowed.

      Note that this doesn't actually check if the EOS marker is present. This introduces a few minor downsides:

      • Some (not all!) streams that would have more data than the specified uncompressed size, for example due to data corruption, will be accepted as valid.
      • After read has returned -1 the input position might not be at the end of the stream (too little input may have been read).

      This should be called after the constructor before reading any data from the stream. This is a separate function because adding even more constructors to this class didn't look like a good alternative.

      Since:
      1.9
    • read

      public int read() throws IOException
      Decompresses the next byte from this input stream.

      Reading lots of data with read() from this input stream may be inefficient. Wrap it in java.io.BufferedInputStream if you need to read lots of data one byte at a time.

      Specified by:
      read in class InputStream
      Returns:
      the next decompressed byte, or -1 to indicate the end of the compressed stream
      Throws:
      CorruptedInputException
      XZIOException - if the stream has been closed
      EOFException - compressed input is truncated or corrupt
      IOException - may be thrown by in
    • read

      public int read​(byte[] buf, int off, int len) throws IOException
      Decompresses into an array of bytes.

      If len is zero, no bytes are read and 0 is returned. Otherwise this will block until len bytes have been decompressed, the end of the LZMA stream is reached, or an exception is thrown.

      Overrides:
      read in class InputStream
      Parameters:
      buf - target buffer for uncompressed data
      off - start offset in buf
      len - maximum number of uncompressed bytes to read
      Returns:
      number of bytes read, or -1 to indicate the end of the compressed stream
      Throws:
      CorruptedInputException
      XZIOException - if the stream has been closed
      EOFException - compressed input is truncated or corrupt
      IOException - may be thrown by in
    • close

      public void close() throws IOException
      Closes the stream and calls in.close(). If the stream was already closed, this does nothing.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException - if thrown by in.close()