Data Classes

class inseq.data.data_utils.TensorWrapper[source]

Wrapper for tensors and lists of tensors to allow for easy access to their attributes.

__getitem__(subscript) TensorClass[source]

By default, idiomatic slicing is used for the sequence dimension across batches. For batching use slice_batch instead.

Batching

class inseq.data.batch.BatchEncoding(input_ids: Int64[Tensor, 'batch_size seq_len'], attention_mask: Int64[Tensor, 'batch_size seq_len'], input_tokens: Sequence[Sequence[str]] | None = None, baseline_ids: Int64[Tensor, 'batch_size seq_len'] | None = None)[source]

Output produced by the tokenization process using encode().

input_ids

Batch of token ids with shape [batch_size, longest_seq_length]. Extra tokens for each sentence are padded, and truncation to max_seq_length is performed.

Type:

torch.Tensor

input_tokens

List of lists containing tokens for each sentence in the batch.

Type:

list(list(str))

attention_mask

Batch of attention masks with shape [batch_size, longest_seq_length]. 1 for positions that are valid, 0 for padded positions.

Type:

torch.Tensor

baseline_ids

Batch of reference token ids with shape [batch_size, longest_seq_length]. Used for attribution methods requiring a baseline input (e.g. IG).

Type:

torch.Tensor, optional

class inseq.data.batch.BatchEmbedding(input_embeds: Float[Tensor, 'batch_size seq_len embed_size'] | None = None, baseline_embeds: Float[Tensor, 'batch_size seq_len embed_size'] | None = None)[source]

Embeddings produced by the embedding process using embed().

input_embeds

Batch of token embeddings with shape [batch_size, longest_seq_length, embedding_size] for each sentence in the batch.

Type:

torch.Tensor

baseline_embeds

Batch of reference token embeddings with shape [batch_size, longest_seq_length, embedding_size] for each sentence in the batch.

Type:

torch.Tensor, optional

class inseq.data.batch.Batch(encoding: BatchEncoding, embedding: BatchEmbedding)[source]

Batch of input data for the attribution model.

encoding

Output produced by the tokenization process using encode().

Type:

BatchEncoding

embedding

Embeddings produced by the embedding process using embed().

Type:

BatchEmbedding

All attribute fields are accessible as properties (e.g. batch.input_ids corresponds to

batch.encoding.input_ids)

class inseq.data.batch.EncoderDecoderBatch(sources: Batch, targets: Batch)[source]

Batch of input data for the encoder-decoder attribution model, including information for the source text and the target prefix.

sources

Batch of input data for the source text.

Type:

Batch

targets

Batch of input data for the target prefix.

Type:

Batch

class inseq.data.batch.DecoderOnlyBatch(encoding: BatchEncoding, embedding: BatchEmbedding)[source]

Input batch adapted for decoder-only attribution models, including information for the target prefix.

Attributions

class inseq.data.attribution.FeatureAttributionSequenceOutput(source: list[TokenWithId], target: list[TokenWithId], source_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, target_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, step_scores: dict[str, Float32[Tensor, 'generated_seq_len']] | None = None, sequence_scores: dict[str, Float32[Tensor, 'attributed_seq_len generated_seq_len']] | None = None, attr_pos_start: int = 0, attr_pos_end: int | None = None, _aggregator: str | list[str] | None = None, _dict_aggregate_fn: dict[str, str] | None = None, _attribution_dim_names: dict[str, dict[int, str]] | None = None, _num_dimensions: int | None = None)[source]

Output produced by a standard attribution method.

source

Tokenized source sequence.

Type:

list of TokenWithId

target

Tokenized target sequence.

Type:

list of TokenWithId

source_attributions

Tensor of shape (source_len, target_len) plus an optional third dimension if the attribution is granular (e.g. gradient attribution) containing the attribution scores produced at each generation step of the target for every source token.

Type:

SequenceAttributionTensor

target_attributions

Tensor of shape (target_len, target_len), plus an optional third dimension if the attribution is granular containing the attribution scores produced at each generation step of the target for every token in the target prefix.

Type:

SequenceAttributionTensor, optional

step_scores

Dictionary of step scores produced alongside attributions (one per generation step).

Type:

dict[str, SingleScorePerStepTensor], optional

sequence_scores

Dictionary of sequence scores produced alongside attributions (n per generation step, as for attributions).

Type:

dict[str, MultipleScoresPerStepTensor], optional

decode_tokens(tokenizer) FeatureAttributionSequenceOutput[source]

Decode tokens in place using the tokenizer for human-readable display.

This is especially useful for byte-level tokenizers (e.g., Qwen) where raw vocabulary tokens may be unreadable. Each token’s string representation is replaced with the decoded version while preserving the token ID.

Parameters:

tokenizer – The tokenizer to use for decoding. Should have a decode method that accepts a list of token IDs.

Returns:

The modified attribution output (for method chaining).

Return type:

self

Example

>>> out = model.attribute("δ½ ε₯½δΈ–η•Œ")
>>> out.sequence_attributions[0].decode_tokens(model.tokenizer)
>>> print([t.token for t in out.sequence_attributions[0].source])
['δ½ ε₯½', 'δΈ–η•Œ']  # Instead of garbled bytes
classmethod from_step_attributions(attributions: list[FeatureAttributionStepOutput], tokenized_target_sentences: list[list[TokenWithId]], pad_token: Any | None = None, attr_pos_end: int | None = None) list[FeatureAttributionSequenceOutput][source]

Converts a list of FeatureAttributionStepOutput objects containing multiple examples outputs per step into a list of FeatureAttributionSequenceOutput with every object containing all step outputs for an individual example.

Raises:

ValueError – If the number of sequences in the attributions is not the same for all input sequences.

Returns:

List of FeatureAttributionSequenceOutput objects.

Return type:

List[FeatureAttributionSequenceOutput]

show(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | None[source]

Visualize the attributions.

Parameters:
  • min_val (int, optional, defaults to None) – Minimum value in the color range of the visualization. If None, the minimum value of the attributions across all visualized examples is used.

  • max_val (int, optional, defaults to None) – Maximum value in the color range of the visualization. If None, the maximum value of the attributions across all visualized examples is used.

  • max_show_size (int, optional, defaults to None) – For granular visualization, this parameter specifies the maximum dimension size for additional dimensions to be visualized. Default: 20.

  • show_dim (int or str, optional, defaults to None) – For granular visualization, this parameter specifies the dimension that should be visualized along with the source and target tokens. Can be either the dimension index or the dimension name. Works only if the dimension size is less than or equal to max_show_size.

  • slice_dims (dict[int or str, tuple[int, int]], optional, defaults to None) – For granular visualization, this parameter specifies the dimensions that should be sliced and visualized along with the source and target tokens. The dictionary should contain the dimension index or name as the key and the slice range as the value.

  • display (bool, optional, defaults to True) – Whether to display the visualization. Can be set to False if the visualization is produced and stored for later use.

  • return_html (bool, optional, defaults to False) – Whether to return the HTML code of the visualization.

  • return_figure (bool, optional, defaults to False) – For granular visualization, whether to return the Treescope figure object for further manipulation.

  • aggregator (AggregatorPipeline, optional, defaults to None) – Aggregates attributions before visualizing them. If not specified, the default aggregator for the class is used.

  • do_aggregation (bool, optional, defaults to True) – Whether to aggregate the attributions before visualizing them. Allows to skip aggregation if the attributions are already aggregated.

Returns:

The HTML code of the visualization if return_html is set to True, otherwise None.

Return type:

str

show_granular(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False) str | None[source]

Visualizes granular attribution heatmaps in HTML format.

Parameters:
  • min_val (int, optional, defaults to None) – Lower attribution score threshold for color map.

  • max_val (int, optional, defaults to None) – Upper attribution score threshold for color map.

  • max_show_size (int, optional, defaults to None) – Maximum dimension size for additional dimensions to be visualized. Default: 20.

  • show_dim (int or str, optional, defaults to None) – Dimension to be visualized along with the source and target tokens. Can be either the dimension index or the dimension name. Works only if the dimension size is less than or equal to max_show_size.

  • slice_dims (dict[int or str, tuple[int, int]], optional, defaults to None) – Dimensions to be sliced and visualized along with the source and target tokens. The dictionary should contain the dimension index or name as the key and the slice range as the value.

  • display (bool, optional, defaults to True) – Whether to show the output of the visualization function.

  • return_html (bool, optional, defaults to False) – If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.

  • return_figure (bool, optional, defaults to False) – If true, returns the Treescope figure object for further manipulation.

Returns:

Returns the HTML output if return_html=True

Return type:

str

show_tokens(min_val: int | None = None, max_val: int | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False, replace_char: dict[str, str] | None = None, wrap_after: int | str | list[str] | tuple[str] | None = None, step_score_highlight: str | None = None, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | None[source]

Visualizes token-level attributions in HTML format.

Parameters:
  • attributions (FeatureAttributionSequenceOutput) – Sequence attributions to be visualized.

  • min_val (int, optional, defaults to None) – Lower attribution score threshold for color map.

  • max_val (int, optional, defaults to None) – Upper attribution score threshold for color map.

  • display (bool, optional, defaults to True) – Whether to show the output of the visualization function.

  • return_html (bool, optional, defaults to False) – If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.

  • return_figure (bool, optional, defaults to False) – If true, returns the Treescope figure object for further manipulation.

  • replace_char (dict[str, str], optional, defaults to None) – Dictionary mapping strings to be replaced to replacement options, used for cleaning special characters. Default: {}.

  • wrap_after (int or str or list[str] tuple[str]], optional, defaults to None) – Token indices or tokens after which to wrap lines. E.g. 10 = wrap after every 10 tokens, β€œhi” = wrap after word hi occurs, [β€œ.” β€œ!”, β€œ?”] or β€œ.!?” = wrap after every sentence-ending punctuation.

  • step_score_highlight (str, optional, defaults to None) – Name of the step score to use to highlight generated tokens in the visualization. If None, no highlights are shown. Default: None.

weight_attributions(step_fn_id: str)[source]

Weights attribution scores in place by the value of the selected step function for every generation step.

Parameters:

step_fn_id (str) – The id of the step function to use for weighting the attributions (e.g. probability)

class inseq.data.attribution.FeatureAttributionStepOutput(source_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, step_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size']] | None = None, target_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, sequence_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len']] | None = None, source: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, prefix: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, target: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, _num_dimensions: int | None = None, _sequence_cls: type[~inseq.data.attribution.FeatureAttributionSequenceOutput] = <class 'inseq.data.attribution.FeatureAttributionSequenceOutput'>)[source]

Output of a single step of feature attribution, plus extra information related to what was attributed.

remap_from_filtered(target_attention_mask: Int64[Tensor, 'batch_size'], batch: DecoderOnlyBatch | EncoderDecoderBatch, is_final_step_method: bool = False) None[source]

Remaps the attributions to the original shape of the input sequence.

class inseq.data.attribution.FeatureAttributionOutput(sequence_attributions: list[~inseq.data.attribution.FeatureAttributionSequenceOutput], step_attributions: list[~inseq.data.attribution.FeatureAttributionStepOutput] | None = None, info: dict[str, ~typing.Any] = <factory>)[source]

Output produced by the AttributionModel.attribute method.

sequence_attributions

List containing all attributions performed on input sentences (one per input sentence, including source and optionally target-side attribution).

Type:

list of FeatureAttributionSequenceOutput

step_attributions

List containing all step attributions (one per generation step performed on the batch), returned if output_step_attributions=True.

Type:

list of FeatureAttributionStepOutput, optional

info

Dictionary including all available parameters used to perform the attribution.

Type:

dict with str keys and any values

aggregate(aggregator: AggregatorPipeline | type[Aggregator] = None, **kwargs) FeatureAttributionOutput[source]

Aggregate the sequence attributions using one or more aggregators.

Parameters:

aggregator (AggregatorPipeline or Type[Aggregator], optional) – Aggregator or pipeline to use. If not provided, the default aggregator for every sequence attribution is used.

Returns:

Aggregated attribution output

Return type:

FeatureAttributionOutput

decode_tokens(tokenizer) FeatureAttributionOutput[source]

Decode tokens in all sequence attributions for human-readable display.

This is especially useful for byte-level tokenizers (e.g., Qwen) where raw vocabulary tokens may be unreadable. Each token’s string representation is replaced with the decoded version while preserving the token ID.

Parameters:

tokenizer – The tokenizer to use for decoding. Should have a decode method that accepts a list of token IDs.

Returns:

The modified attribution output (for method chaining).

Return type:

self

Example

>>> out = model.attribute("δ½ ε₯½δΈ–η•Œ")
>>> out.decode_tokens(model.tokenizer)
>>> print([t.token for t in out[0].source])
['δ½ ε₯½', 'δΈ–η•Œ']  # Instead of garbled bytes
get_scores_dicts(aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) list[dict[str, dict[str, dict[str, float]]]][source]

Get all computed scores (attributions and step scores) for all sequences as a list of dictionaries.

Returns:

List containing one dictionary per sequence. Every dictionary contains the keys β€œsource_attributions”, β€œtarget_attributions” and β€œstep_scores”. For each of these keys, the value is a dictionary with generated tokens as keys, and for values a final dictionary. For β€œstep_scores”, the keys of the final dictionary are the step score ids, and the values are the scores. For β€œsource_attributions” and β€œtarget_attributions”, the keys of the final dictionary are respectively source and target tokens, and the values are the attribution scores.

Return type:

list(dict)

This output is intended to be easily converted to a pandas DataFrame. The following example produces a list of DataFrames, one for each sequence, matching the source attributions that would be visualized by out.show().

`python dfs = [pd.DataFrame(x["source_attributions"]) for x in out.get_scores_dicts()] `

static load(path: PathLike, decompress: bool = False) FeatureAttributionOutput[source]

Load saved attribution output into a new FeatureAttributionOutput object.

Parameters:
  • path (str) – Path to the JSON file containing the saved attribution output. Note that the file must have been saved with the save() method with use_primitives=False in order to be loaded correctly.

  • decompress (bool, optional, defaults to False) – If True, the input file is decompressed using gzip.

Returns:

Loaded attribution output

Return type:

FeatureAttributionOutput

save(path: PathLike, overwrite: bool = False, compress: bool = False, ndarray_compact: bool = True, use_primitives: bool = False, split_sequences: bool = False, scores_precision: Literal['float32', 'float16', 'float8'] = 'float32') None[source]

Save class contents to a JSON file.

Parameters:
  • path (os.PathLike) – Path to the folder where the attribution output will be stored (e.g. ./out.json).

  • overwrite (bool, optional, defaults to False) – If True, overwrite the file if it exists, raise error otherwise.

  • compress (bool, optional, defaults to False) – If True, the output file is compressed using gzip. Especially useful for large sequences and granular attributions with umerged hidden dimensions.

  • ndarray_compact (bool, optional, defaults to True) – If True, the arrays for scores and attributions are stored in a compact b64 format. Otherwise, they are stored as plain lists of floats.

  • use_primitives (bool, optional, defaults to False) – If True, the output is stored as a list of dictionaries with primitive types (e.g. int, float, str). Note that an attribution saved with this option cannot be loaded with the load method.

  • split_sequences (bool, optional, defaults to False) – If True, the output is split into multiple files, one per sequence. The file names are generated by appending the sequence index to the given path (e.g. ./out.json with two sequences -> ./out_0.json, ./out_1.json)

  • scores_precision (str, optional, defaults to β€œfloat32”) – Rounding precision for saved scores. Can be used to reduce space on disk but introduces rounding errors. Can be combined with compress=True for further space reduction. Accepted values: β€œfloat32”, β€œfloat16”, or β€œfloat8”. Default: β€œfloat32” (no rounding).

show(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | list | None[source]

Visualize the sequence attributions.

Parameters:
  • min_val (int, optional) – Minimum value for color scale.

  • max_val (int, optional) – Maximum value for color scale.

  • max_show_size (int, optional) – Maximum size of the dimension to show.

  • show_dim (int or str, optional) – Dimension to show.

  • slice_dims (dict[int or str, tuple[int, int]], optional) – Dimensions to slice.

  • display (bool, optional) – If True, display the attribution visualization.

  • return_html (bool, optional) – If True, return the attribution visualization as HTML.

  • return_figure (bool, optional) – If True, return the Treescope figure object for further manipulation.

  • aggregator (AggregatorPipeline or Type[Aggregator], optional) – Aggregator or pipeline to use. If not provided, the default aggregator for every sequence attribution is used.

  • do_aggregation (bool, optional, defaults to True) – Whether to aggregate the attributions before visualizing them. Allows to skip aggregation if the attributions are already aggregated.

Returns:

Attribution visualization as HTML if return_html=True list: List of Treescope figure objects if return_figure=True None if return_html=False and return_figure=False

Return type:

str

show_granular(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool = False, return_figure: bool = False) str | None[source]

Visualizes granular attribution heatmaps in HTML format.

Parameters:
  • min_val (int, optional, defaults to None) – Lower attribution score threshold for color map.

  • max_val (int, optional, defaults to None) – Upper attribution score threshold for color map.

  • max_show_size (int, optional, defaults to None) – Maximum dimension size for additional dimensions to be visualized. Default: 20.

  • show_dim (int or str, optional, defaults to None) – Dimension to be visualized along with the source and target tokens. Can be either the dimension index or the dimension name. Works only if the dimension size is less than or equal to max_show_size.

  • slice_dims (dict[int or str, tuple[int, int]], optional, defaults to None) – Dimensions to be sliced and visualized along with the source and target tokens. The dictionary should contain the dimension index or name as the key and the slice range as the value.

  • display (bool, optional, defaults to True) – Whether to show the output of the visualization function.

  • return_html (bool, optional, defaults to False) – If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.

  • return_figure (bool, optional, defaults to False) – If true, returns the Treescope figure object for further manipulation.

Returns:

Returns the HTML output if return_html=True

Return type:

str

show_tokens(min_val: int | None = None, max_val: int | None = None, display: bool = True, return_html: bool = False, return_figure: bool = False, replace_char: dict[str, str] | None = None, wrap_after: int | str | list[str] | tuple[str] | None = None, step_score_highlight: str | None = None, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | None[source]

Visualizes token-level attributions in HTML format.

Parameters:
  • min_val (int, optional, defaults to None) – Lower attribution score threshold for color map.

  • max_val (int, optional, defaults to None) – Upper attribution score threshold for color map.

  • display (bool, optional, defaults to True) – Whether to show the output of the visualization function.

  • return_html (bool, optional, defaults to False) – If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.

  • return_figure (bool, optional, defaults to False) – If true, returns the Treescope figure object for further manipulation.

  • replace_char (dict[str, str], optional, defaults to None) – Dictionary mapping strings to be replaced to replacement options, used for cleaning special characters. Default: {}.

  • wrap_after (int or str or list[str] tuple[str]], optional, defaults to None) – Token indices or tokens after which to wrap lines. E.g. 10 = wrap after every 10 tokens, β€œhi” = wrap after word hi occurs, [β€œ.” β€œ!”, β€œ?”] or β€œ.!?” = wrap after every sentence-ending punctuation.

  • step_score_highlight (str, optional, defaults to None) – Name of the step score to use to highlight generated tokens in the visualization. If None, no highlights are shown. Default: None.

class inseq.data.attribution.GranularFeatureAttributionSequenceOutput(source: list[TokenWithId], target: list[TokenWithId], source_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, target_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, step_scores: dict[str, Float32[Tensor, 'generated_seq_len']] | None = None, sequence_scores: dict[str, Float32[Tensor, 'attributed_seq_len generated_seq_len']] | None = None, attr_pos_start: int = 0, attr_pos_end: int | None = None, _aggregator: str | list[str] | None = None, _dict_aggregate_fn: dict[str, str] | None = None, _attribution_dim_names: dict[str, dict[int, str]] | None = None, _num_dimensions: int | None = None)[source]

Raw output of a single sequence of granular feature attribution.

An example of granular feature attribution methods are gradient-based attribution methods such as Integrated Gradients, returning one score per hidden dimension of the model for every generated token.

Adds the convergence delta and default L2 + normalization merging of attributions to the base class.

class inseq.data.attribution.GranularFeatureAttributionStepOutput(source_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, step_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size']] | None = None, target_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, sequence_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len']] | None = None, source: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, prefix: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, target: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, _num_dimensions: int | None = None, _sequence_cls: type[~inseq.data.attribution.FeatureAttributionSequenceOutput] = <class 'inseq.data.attribution.GranularFeatureAttributionSequenceOutput'>)[source]

Raw output of a single step of gradient feature attribution.