NAME
AI::Ollama::RequestOptions -
SYNOPSIS
my $obj = AI::Ollama::RequestOptions->new();
...
PROPERTIES
embedding_only
Enable embedding only. (Default: false)
f16_kv
Enable f16 key/value. (Default: false)
frequency_penalty
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logits_all
Enable logits all. (Default: false)
low_vram
Enable low VRAM mode. (Default: false)
main_gpu
The GPU to use for the main model. Default is 0.
mirostat
Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
mirostat_eta
Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)
mirostat_tau
Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)
num_batch
Sets the number of batches to use for generation. (Default: 1)
num_ctx
Sets the size of the context window used to generate the next token.
num_gpu
The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.
num_gqa
The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for `llama2:70b`.
num_keep
Number of tokens to keep from the prompt.
num_predict
Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)
num_thread
Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores).
numa
Enable NUMA support. (Default: false)
penalize_newline
Penalize newlines in the output. (Default: false)
presence_penalty
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
repeat_last_n
Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
repeat_penalty
Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
rope_frequency_base
The base of the rope frequency scale. (Default: 1.0)
rope_frequency_scale
The scale of the rope frequency. (Default: 1.0)
seed
Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)
stop
Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
temperature
The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)
tfs_z
Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)
top_k
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
typical_p
Typical p is used to reduce the impact of less probable tokens from the output.
use_mlock
Enable mlock. (Default: false)
use_mmap
Enable mmap. (Default: false)
vocab_only
Enable vocab only. (Default: false)