NAME

AI::Ollama::RequestOptions -

SYNOPSIS

my $obj = AI::Ollama::RequestOptions->new();
...

PROPERTIES

embedding_only

Enable embedding only. (Default: false)

f16_kv

Enable f16 key/value. (Default: false)

frequency_penalty

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

logits_all

Enable logits all. (Default: false)

low_vram

Enable low VRAM mode. (Default: false)

main_gpu

The GPU to use for the main model. Default is 0.

mirostat

Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)

mirostat_eta

Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)

mirostat_tau

Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

num_batch

Sets the number of batches to use for generation. (Default: 1)

num_ctx

Sets the size of the context window used to generate the next token.

num_gpu

The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.

num_gqa

The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for `llama2:70b`.

num_keep

Number of tokens to keep from the prompt.

num_predict

Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)

num_thread

Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores).

numa

Enable NUMA support. (Default: false)

penalize_newline

Penalize newlines in the output. (Default: false)

presence_penalty

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

repeat_last_n

Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)

repeat_penalty

Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)

rope_frequency_base

The base of the rope frequency scale. (Default: 1.0)

rope_frequency_scale

The scale of the rope frequency. (Default: 1.0)

seed

Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)

stop

Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

temperature

The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)

tfs_z

Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)

top_k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)

top_p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)

typical_p

Typical p is used to reduce the impact of less probable tokens from the output.

use_mlock

Enable mlock. (Default: false)

use_mmap

Enable mmap. (Default: false)

vocab_only

Enable vocab only. (Default: false)