Class LZMA2Options

java.lang.Object
org.tukaani.xz.FilterOptions
org.tukaani.xz.LZMA2Options
All Implemented Interfaces:
Cloneable

public class LZMA2Options
extends FilterOptions
LZMA2 compression options.

While this allows setting the LZMA2 compression options in detail, often you only need LZMA2Options() or LZMA2Options(int).

  • Field Summary

    Fields
    Modifier and Type Field Description
    static int DICT_SIZE_DEFAULT
    The default dictionary size is 8 MiB.
    static int DICT_SIZE_MAX
    Maximum dictionary size for compression is 768 MiB.
    static int DICT_SIZE_MIN
    Minimum dictionary size is 4 KiB.
    static int LC_DEFAULT
    The default number of literal context bits is 3.
    static int LC_LP_MAX
    Maximum value for lc + lp is 4.
    static int LP_DEFAULT
    The default number of literal position bits is 0.
    static int MF_BT4
    Match finder: Binary tree 2-3-4
    static int MF_HC4
    Match finder: Hash Chain 2-3-4
    static int MODE_FAST
    Compression mode: fast.
    static int MODE_NORMAL
    Compression mode: normal.
    static int MODE_UNCOMPRESSED
    Compression mode: uncompressed.
    static int NICE_LEN_MAX
    Maximum value for niceLen is 273.
    static int NICE_LEN_MIN
    Minimum value for niceLen is 8.
    static int PB_DEFAULT
    The default number of position bits is 2.
    static int PB_MAX
    Maximum value for pb is 4.
    static int PRESET_DEFAULT
    Default compression preset level is 6.
    static int PRESET_MAX
    Maximum valid compression preset level is 9.
    static int PRESET_MIN
    Minimum valid compression preset level is 0.
  • Constructor Summary

    Constructors
    Constructor Description
    LZMA2Options()
    Creates new LZMA2 options and sets them to the default values.
    LZMA2Options​(int preset)
    Creates new LZMA2 options and sets them to the given preset.
    LZMA2Options​(int dictSize, int lc, int lp, int pb, int mode, int niceLen, int mf, int depthLimit)
    Creates new LZMA2 options and sets them to the given custom values.
  • Method Summary

    Modifier and Type Method Description
    Object clone()  
    int getDecoderMemoryUsage()
    Gets how much memory the LZMA2 decoder will need to decompress the data that was encoded with these options and stored in a .xz file.
    int getDepthLimit()
    Gets the match finder search depth limit.
    int getDictSize()
    Gets the dictionary size in bytes.
    int getEncoderMemoryUsage()
    Gets how much memory the encoder will need with these options.
    InputStream getInputStream​(InputStream in, ArrayCache arrayCache)
    Gets a raw (no XZ headers) decoder input stream using these options and the given ArrayCache.
    int getLc()
    Gets the number of literal context bits.
    int getLp()
    Gets the number of literal position bits.
    int getMatchFinder()
    Gets the match finder type.
    int getMode()
    Gets the compression mode.
    int getNiceLen()
    Gets the nice length of matches.
    FinishableOutputStream getOutputStream​(FinishableOutputStream out, ArrayCache arrayCache)
    Gets a raw (no XZ headers) encoder output stream using these options and the given ArrayCache.
    int getPb()
    Gets the number of position bits.
    byte[] getPresetDict()
    Gets the preset dictionary.
    void setDepthLimit​(int depthLimit)
    Sets the match finder search depth limit.
    void setDictSize​(int dictSize)
    Sets the dictionary size in bytes.
    void setLc​(int lc)
    Sets the number of literal context bits.
    void setLcLp​(int lc, int lp)
    Sets the number of literal context bits and literal position bits.
    void setLp​(int lp)
    Sets the number of literal position bits.
    void setMatchFinder​(int mf)
    Sets the match finder type.
    void setMode​(int mode)
    Sets the compression mode.
    void setNiceLen​(int niceLen)
    Sets the nice length of matches.
    void setPb​(int pb)
    Sets the number of position bits.
    void setPreset​(int preset)
    Sets the compression options to the given preset.
    void setPresetDict​(byte[] presetDict)
    Sets a preset dictionary.

    Methods inherited from class java.lang.Object

    equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • PRESET_MIN

      public static final int PRESET_MIN
      Minimum valid compression preset level is 0.
      See Also:
      Constant Field Values
    • PRESET_MAX

      public static final int PRESET_MAX
      Maximum valid compression preset level is 9.
      See Also:
      Constant Field Values
    • PRESET_DEFAULT

      public static final int PRESET_DEFAULT
      Default compression preset level is 6.
      See Also:
      Constant Field Values
    • DICT_SIZE_MIN

      public static final int DICT_SIZE_MIN
      Minimum dictionary size is 4 KiB.
      See Also:
      Constant Field Values
    • DICT_SIZE_MAX

      public static final int DICT_SIZE_MAX
      Maximum dictionary size for compression is 768 MiB.

      The decompressor supports bigger dictionaries, up to almost 2 GiB. With HC4 the encoder would support dictionaries bigger than 768 MiB. The 768 MiB limit comes from the current implementation of BT4 where we would otherwise hit the limits of signed ints in array indexing.

      If you really need bigger dictionary for decompression, use LZMA2InputStream directly.

      See Also:
      Constant Field Values
    • DICT_SIZE_DEFAULT

      public static final int DICT_SIZE_DEFAULT
      The default dictionary size is 8 MiB.
      See Also:
      Constant Field Values
    • LC_LP_MAX

      public static final int LC_LP_MAX
      Maximum value for lc + lp is 4.
      See Also:
      Constant Field Values
    • LC_DEFAULT

      public static final int LC_DEFAULT
      The default number of literal context bits is 3.
      See Also:
      Constant Field Values
    • LP_DEFAULT

      public static final int LP_DEFAULT
      The default number of literal position bits is 0.
      See Also:
      Constant Field Values
    • PB_MAX

      public static final int PB_MAX
      Maximum value for pb is 4.
      See Also:
      Constant Field Values
    • PB_DEFAULT

      public static final int PB_DEFAULT
      The default number of position bits is 2.
      See Also:
      Constant Field Values
    • MODE_UNCOMPRESSED

      public static final int MODE_UNCOMPRESSED
      Compression mode: uncompressed. The data is wrapped into a LZMA2 stream without compression.
      See Also:
      Constant Field Values
    • MODE_FAST

      public static final int MODE_FAST
      Compression mode: fast. This is usually combined with a hash chain match finder.
      See Also:
      Constant Field Values
    • MODE_NORMAL

      public static final int MODE_NORMAL
      Compression mode: normal. This is usually combined with a binary tree match finder.
      See Also:
      Constant Field Values
    • NICE_LEN_MIN

      public static final int NICE_LEN_MIN
      Minimum value for niceLen is 8.
      See Also:
      Constant Field Values
    • NICE_LEN_MAX

      public static final int NICE_LEN_MAX
      Maximum value for niceLen is 273.
      See Also:
      Constant Field Values
    • MF_HC4

      public static final int MF_HC4
      Match finder: Hash Chain 2-3-4
      See Also:
      Constant Field Values
    • MF_BT4

      public static final int MF_BT4
      Match finder: Binary tree 2-3-4
      See Also:
      Constant Field Values
  • Constructor Details

    • LZMA2Options

      public LZMA2Options()
      Creates new LZMA2 options and sets them to the default values. This is equivalent to LZMA2Options(PRESET_DEFAULT).
    • LZMA2Options

      public LZMA2Options​(int preset) throws UnsupportedOptionsException
      Creates new LZMA2 options and sets them to the given preset.
      Throws:
      UnsupportedOptionsException - preset is not supported
    • LZMA2Options

      public LZMA2Options​(int dictSize, int lc, int lp, int pb, int mode, int niceLen, int mf, int depthLimit) throws UnsupportedOptionsException
      Creates new LZMA2 options and sets them to the given custom values.
      Throws:
      UnsupportedOptionsException - unsupported options were specified
  • Method Details

    • setPreset

      public void setPreset​(int preset) throws UnsupportedOptionsException
      Sets the compression options to the given preset.

      The presets 0-3 are fast presets with medium compression. The presets 4-6 are fairly slow presets with high compression. The default preset (PRESET_DEFAULT) is 6.

      The presets 7-9 are like the preset 6 but use bigger dictionaries and have higher compressor and decompressor memory requirements. Unless the uncompressed size of the file exceeds 8 MiB, 16 MiB, or 32 MiB, it is waste of memory to use the presets 7, 8, or 9, respectively.

      Throws:
      UnsupportedOptionsException - preset is not supported
    • setDictSize

      public void setDictSize​(int dictSize) throws UnsupportedOptionsException
      Sets the dictionary size in bytes.

      The dictionary (or history buffer) holds the most recently seen uncompressed data. Bigger dictionary usually means better compression. However, using a dictioanary bigger than the size of the uncompressed data is waste of memory.

      Any value in the range [DICT_SIZE_MIN, DICT_SIZE_MAX] is valid, but sizes of 2^n and 2^n + 2^(n-1) bytes are somewhat recommended.

      Throws:
      UnsupportedOptionsException - dictSize is not supported
    • getDictSize

      public int getDictSize()
      Gets the dictionary size in bytes.
    • setPresetDict

      public void setPresetDict​(byte[] presetDict)
      Sets a preset dictionary. Use null to disable the use of a preset dictionary. By default there is no preset dictionary.

      The .xz format doesn't support a preset dictionary for now. Do not set a preset dictionary unless you use raw LZMA2.

      Preset dictionary can be useful when compressing many similar, relatively small chunks of data independently from each other. A preset dictionary should contain typical strings that occur in the files being compressed. The most probable strings should be near the end of the preset dictionary. The preset dictionary used for compression is also needed for decompression.

    • getPresetDict

      public byte[] getPresetDict()
      Gets the preset dictionary.
    • setLcLp

      public void setLcLp​(int lc, int lp) throws UnsupportedOptionsException
      Sets the number of literal context bits and literal position bits.

      The sum of lc and lp is limited to 4. Trying to exceed it will throw an exception. This function lets you change both at the same time.

      Throws:
      UnsupportedOptionsException - lc and lp are invalid
    • setLc

      public void setLc​(int lc) throws UnsupportedOptionsException
      Sets the number of literal context bits.

      All bytes that cannot be encoded as matches are encoded as literals. That is, literals are simply 8-bit bytes that are encoded one at a time.

      The literal coding makes an assumption that the highest lc bits of the previous uncompressed byte correlate with the next byte. For example, in typical English text, an upper-case letter is often followed by a lower-case letter, and a lower-case letter is usually followed by another lower-case letter. In the US-ASCII character set, the highest three bits are 010 for upper-case letters and 011 for lower-case letters. When lc is at least 3, the literal coding can take advantage of this property in the uncompressed data.

      The default value (3) is usually good. If you want maximum compression, try setLc(4). Sometimes it helps a little, and sometimes it makes compression worse. If it makes it worse, test for example setLc(2) too.

      Throws:
      UnsupportedOptionsException - lc is invalid, or the sum of lc and lp exceed LC_LP_MAX
    • setLp

      public void setLp​(int lp) throws UnsupportedOptionsException
      Sets the number of literal position bits.

      This affets what kind of alignment in the uncompressed data is assumed when encoding literals. See setPb for more information about alignment.

      Throws:
      UnsupportedOptionsException - lp is invalid, or the sum of lc and lp exceed LC_LP_MAX
    • getLc

      public int getLc()
      Gets the number of literal context bits.
    • getLp

      public int getLp()
      Gets the number of literal position bits.
    • setPb

      public void setPb​(int pb) throws UnsupportedOptionsException
      Sets the number of position bits.

      This affects what kind of alignment in the uncompressed data is assumed in general. The default (2) means four-byte alignment (2^pb = 2^2 = 4), which is often a good choice when there's no better guess.

      When the alignment is known, setting the number of position bits accordingly may reduce the file size a little. For example with text files having one-byte alignment (US-ASCII, ISO-8859-*, UTF-8), using setPb(0) can improve compression slightly. For UTF-16 text, setPb(1) is a good choice. If the alignment is an odd number like 3 bytes, setPb(0) might be the best choice.

      Even though the assumed alignment can be adjusted with setPb and setLp, LZMA2 still slightly favors 16-byte alignment. It might be worth taking into account when designing file formats that are likely to be often compressed with LZMA2.

      Throws:
      UnsupportedOptionsException - pb is invalid
    • getPb

      public int getPb()
      Gets the number of position bits.
    • setMode

      public void setMode​(int mode) throws UnsupportedOptionsException
      Sets the compression mode.

      This specifies the method to analyze the data produced by a match finder. The default is MODE_FAST for presets 0-3 and MODE_NORMAL for presets 4-9.

      Usually MODE_FAST is used with Hash Chain match finders and MODE_NORMAL with Binary Tree match finders. This is also what the presets do.

      The special mode MODE_UNCOMPRESSED doesn't try to compress the data at all (and doesn't use a match finder) and will simply wrap it in uncompressed LZMA2 chunks.

      Throws:
      UnsupportedOptionsException - mode is not supported
    • getMode

      public int getMode()
      Gets the compression mode.
    • setNiceLen

      public void setNiceLen​(int niceLen) throws UnsupportedOptionsException
      Sets the nice length of matches. Once a match of at least niceLen bytes is found, the algorithm stops looking for better matches. Higher values tend to give better compression at the expense of speed. The default depends on the preset.
      Throws:
      UnsupportedOptionsException - niceLen is invalid
    • getNiceLen

      public int getNiceLen()
      Gets the nice length of matches.
    • setMatchFinder

      public void setMatchFinder​(int mf) throws UnsupportedOptionsException
      Sets the match finder type.

      Match finder has a major effect on compression speed, memory usage, and compression ratio. Usually Hash Chain match finders are faster than Binary Tree match finders. The default depends on the preset: 0-3 use MF_HC4 and 4-9 use MF_BT4.

      Throws:
      UnsupportedOptionsException - mf is not supported
    • getMatchFinder

      public int getMatchFinder()
      Gets the match finder type.
    • setDepthLimit

      public void setDepthLimit​(int depthLimit) throws UnsupportedOptionsException
      Sets the match finder search depth limit.

      The default is a special value of 0 which indicates that the depth limit should be automatically calculated by the selected match finder from the nice length of matches.

      Reasonable depth limit for Hash Chain match finders is 4-100 and 16-1000 for Binary Tree match finders. Using very high values can make the compressor extremely slow with some files. Avoid settings higher than 1000 unless you are prepared to interrupt the compression in case it is taking far too long.

      Throws:
      UnsupportedOptionsException - depthLimit is invalid
    • getDepthLimit

      public int getDepthLimit()
      Gets the match finder search depth limit.
    • getEncoderMemoryUsage

      public int getEncoderMemoryUsage()
      Description copied from class: FilterOptions
      Gets how much memory the encoder will need with these options.
      Specified by:
      getEncoderMemoryUsage in class FilterOptions
    • getOutputStream

      public FinishableOutputStream getOutputStream​(FinishableOutputStream out, ArrayCache arrayCache)
      Description copied from class: FilterOptions
      Gets a raw (no XZ headers) encoder output stream using these options and the given ArrayCache. Raw streams are an advanced feature. In most cases you want to store the compressed data in the .xz container format instead of using a raw stream. To use this filter in a .xz file, pass this object to XZOutputStream.
      Specified by:
      getOutputStream in class FilterOptions
    • getDecoderMemoryUsage

      public int getDecoderMemoryUsage()
      Gets how much memory the LZMA2 decoder will need to decompress the data that was encoded with these options and stored in a .xz file.

      The returned value may bigger than the value returned by a direct call to LZMA2InputStream.getMemoryUsage(int) if the dictionary size is not 2^n or 2^n + 2^(n-1) bytes. This is because the .xz headers store the dictionary size in such a format and other values are rounded up to the next such value. Such rounding is harmess except it might waste some memory if an unsual dictionary size is used.

      If you use raw LZMA2 streams and unusual dictioanary size, call LZMA2InputStream.getMemoryUsage(int) directly to get raw decoder memory requirements.

      Specified by:
      getDecoderMemoryUsage in class FilterOptions
    • getInputStream

      public InputStream getInputStream​(InputStream in, ArrayCache arrayCache) throws IOException
      Description copied from class: FilterOptions
      Gets a raw (no XZ headers) decoder input stream using these options and the given ArrayCache.
      Specified by:
      getInputStream in class FilterOptions
      Throws:
      IOException
    • clone

      public Object clone()
      Overrides:
      clone in class Object