Data Classes
- class inseq.data.data_utils.TensorWrapper[source]
Wrapper for tensors and lists of tensors to allow for easy access to their attributes.
- __getitem__(subscript) TensorClass[source]
By default, idiomatic slicing is used for the sequence dimension across batches. For batching use slice_batch instead.
Batching
- class inseq.data.batch.BatchEncoding(input_ids: Int64[Tensor, 'batch_size seq_len'], attention_mask: Int64[Tensor, 'batch_size seq_len'], input_tokens: Sequence[Sequence[str]] | None = None, baseline_ids: Int64[Tensor, 'batch_size seq_len'] | None = None)[source]
Output produced by the tokenization process using
encode().- input_ids
Batch of token ids with shape
[batch_size, longest_seq_length]. Extra tokens for each sentence are padded, and truncation tomax_seq_lengthis performed.- Type:
torch.Tensor
- class inseq.data.batch.BatchEmbedding(input_embeds: Float[Tensor, 'batch_size seq_len embed_size'] | None = None, baseline_embeds: Float[Tensor, 'batch_size seq_len embed_size'] | None = None)[source]
Embeddings produced by the embedding process using
embed().
- class inseq.data.batch.Batch(encoding: BatchEncoding, embedding: BatchEmbedding)[source]
Batch of input data for the attribution model.
- All attribute fields are accessible as properties (e.g.
batch.input_idscorresponds to batch.encoding.input_ids)
- All attribute fields are accessible as properties (e.g.
- class inseq.data.batch.EncoderDecoderBatch(sources: Batch, targets: Batch)[source]
Batch of input data for the encoder-decoder attribution model, including information for the source text and the target prefix.
- class inseq.data.batch.DecoderOnlyBatch(encoding: BatchEncoding, embedding: BatchEmbedding)[source]
Input batch adapted for decoder-only attribution models, including information for the target prefix.
Attributions
- class inseq.data.attribution.FeatureAttributionSequenceOutput(source: list[TokenWithId], target: list[TokenWithId], source_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, target_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, step_scores: dict[str, Float32[Tensor, 'generated_seq_len']] | None = None, sequence_scores: dict[str, Float32[Tensor, 'attributed_seq_len generated_seq_len']] | None = None, attr_pos_start: int = 0, attr_pos_end: int | None = None, _aggregator: str | list[str] | None = None, _dict_aggregate_fn: dict[str, str] | None = None, _attribution_dim_names: dict[str, dict[int, str]] | None = None, _num_dimensions: int | None = None)[source]
Output produced by a standard attribution method.
- source_attributions
Tensor of shape (source_len, target_len) plus an optional third dimension if the attribution is granular (e.g. gradient attribution) containing the attribution scores produced at each generation step of the target for every source token.
- Type:
SequenceAttributionTensor
- target_attributions
Tensor of shape (target_len, target_len), plus an optional third dimension if the attribution is granular containing the attribution scores produced at each generation step of the target for every token in the target prefix.
- Type:
SequenceAttributionTensor, optional
- step_scores
Dictionary of step scores produced alongside attributions (one per generation step).
- Type:
dict[str, SingleScorePerStepTensor], optional
- sequence_scores
Dictionary of sequence scores produced alongside attributions (n per generation step, as for attributions).
- Type:
dict[str, MultipleScoresPerStepTensor], optional
- decode_tokens(tokenizer) FeatureAttributionSequenceOutput[source]
Decode tokens in place using the tokenizer for human-readable display.
This is especially useful for byte-level tokenizers (e.g., Qwen) where raw vocabulary tokens may be unreadable. Each tokenβs string representation is replaced with the decoded version while preserving the token ID.
- Parameters:
tokenizer β The tokenizer to use for decoding. Should have a decode method that accepts a list of token IDs.
- Returns:
The modified attribution output (for method chaining).
- Return type:
self
Example
>>> out = model.attribute("δ½ ε₯½δΈη") >>> out.sequence_attributions[0].decode_tokens(model.tokenizer) >>> print([t.token for t in out.sequence_attributions[0].source]) ['δ½ ε₯½', 'δΈη'] # Instead of garbled bytes
- classmethod from_step_attributions(attributions: list[FeatureAttributionStepOutput], tokenized_target_sentences: list[list[TokenWithId]], pad_token: Any | None = None, attr_pos_end: int | None = None) list[FeatureAttributionSequenceOutput][source]
Converts a list of
FeatureAttributionStepOutputobjects containing multiple examples outputs per step into a list ofFeatureAttributionSequenceOutputwith every object containing all step outputs for an individual example.- Raises:
ValueError β If the number of sequences in the attributions is not the same for all input sequences.
- Returns:
List of
FeatureAttributionSequenceOutputobjects.- Return type:
List[FeatureAttributionSequenceOutput]
- show(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | None[source]
Visualize the attributions.
- Parameters:
min_val (
int, optional, defaults to None) β Minimum value in the color range of the visualization. If None, the minimum value of the attributions across all visualized examples is used.max_val (
int, optional, defaults to None) β Maximum value in the color range of the visualization. If None, the maximum value of the attributions across all visualized examples is used.max_show_size (
int, optional, defaults to None) β For granular visualization, this parameter specifies the maximum dimension size for additional dimensions to be visualized. Default: 20.show_dim (
intorstr, optional, defaults to None) β For granular visualization, this parameter specifies the dimension that should be visualized along with the source and target tokens. Can be either the dimension index or the dimension name. Works only if the dimension size is less than or equal to max_show_size.slice_dims (
dict[int or str, tuple[int, int]], optional, defaults to None) β For granular visualization, this parameter specifies the dimensions that should be sliced and visualized along with the source and target tokens. The dictionary should contain the dimension index or name as the key and the slice range as the value.display (
bool, optional, defaults to True) β Whether to display the visualization. Can be set to False if the visualization is produced and stored for later use.return_html (
bool, optional, defaults to False) β Whether to return the HTML code of the visualization.return_figure (
bool, optional, defaults to False) β For granular visualization, whether to return the Treescope figure object for further manipulation.aggregator (
AggregatorPipeline, optional, defaults to None) β Aggregates attributions before visualizing them. If not specified, the default aggregator for the class is used.do_aggregation (
bool, optional, defaults to True) β Whether to aggregate the attributions before visualizing them. Allows to skip aggregation if the attributions are already aggregated.
- Returns:
The HTML code of the visualization if
return_htmlis set to True, otherwise None.- Return type:
str
- show_granular(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False) str | None[source]
Visualizes granular attribution heatmaps in HTML format.
- Parameters:
min_val (
int, optional, defaults to None) β Lower attribution score threshold for color map.max_val (
int, optional, defaults to None) β Upper attribution score threshold for color map.max_show_size (
int, optional, defaults to None) β Maximum dimension size for additional dimensions to be visualized. Default: 20.show_dim (
intorstr, optional, defaults to None) β Dimension to be visualized along with the source and target tokens. Can be either the dimension index or the dimension name. Works only if the dimension size is less than or equal to max_show_size.slice_dims (
dict[int or str, tuple[int, int]], optional, defaults to None) β Dimensions to be sliced and visualized along with the source and target tokens. The dictionary should contain the dimension index or name as the key and the slice range as the value.display (
bool, optional, defaults to True) β Whether to show the output of the visualization function.return_html (
bool, optional, defaults to False) β If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.return_figure (
bool, optional, defaults to False) β If true, returns the Treescope figure object for further manipulation.
- Returns:
Returns the HTML output if return_html=True
- Return type:
str
- show_tokens(min_val: int | None = None, max_val: int | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False, replace_char: dict[str, str] | None = None, wrap_after: int | str | list[str] | tuple[str] | None = None, step_score_highlight: str | None = None, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | None[source]
Visualizes token-level attributions in HTML format.
- Parameters:
attributions (
FeatureAttributionSequenceOutput) β Sequence attributions to be visualized.min_val (
int, optional, defaults to None) β Lower attribution score threshold for color map.max_val (
int, optional, defaults to None) β Upper attribution score threshold for color map.display (
bool, optional, defaults to True) β Whether to show the output of the visualization function.return_html (
bool, optional, defaults to False) β If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.return_figure (
bool, optional, defaults to False) β If true, returns the Treescope figure object for further manipulation.replace_char (
dict[str, str], optional, defaults to None) β Dictionary mapping strings to be replaced to replacement options, used for cleaning special characters. Default: {}.wrap_after (
intorstrorlist[str]tuple[str]], optional, defaults to None) β Token indices or tokens after which to wrap lines. E.g. 10 = wrap after every 10 tokens, βhiβ = wrap after word hi occurs, [β.β β!β, β?β] or β.!?β = wrap after every sentence-ending punctuation.step_score_highlight (str, optional, defaults to None) β Name of the step score to use to highlight generated tokens in the visualization. If None, no highlights are shown. Default: None.
- weight_attributions(step_fn_id: str)[source]
Weights attribution scores in place by the value of the selected step function for every generation step.
- Parameters:
step_fn_id (str) β The id of the step function to use for weighting the attributions (e.g.
probability)
- class inseq.data.attribution.FeatureAttributionStepOutput(source_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, step_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size']] | None = None, target_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, sequence_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len']] | None = None, source: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, prefix: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, target: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, _num_dimensions: int | None = None, _sequence_cls: type[~inseq.data.attribution.FeatureAttributionSequenceOutput] = <class 'inseq.data.attribution.FeatureAttributionSequenceOutput'>)[source]
Output of a single step of feature attribution, plus extra information related to what was attributed.
- remap_from_filtered(target_attention_mask: Int64[Tensor, 'batch_size'], batch: DecoderOnlyBatch | EncoderDecoderBatch, is_final_step_method: bool = False) None[source]
Remaps the attributions to the original shape of the input sequence.
- class inseq.data.attribution.FeatureAttributionOutput(sequence_attributions: list[~inseq.data.attribution.FeatureAttributionSequenceOutput], step_attributions: list[~inseq.data.attribution.FeatureAttributionStepOutput] | None = None, info: dict[str, ~typing.Any] = <factory>)[source]
Output produced by the AttributionModel.attribute method.
- sequence_attributions
List containing all attributions performed on input sentences (one per input sentence, including source and optionally target-side attribution).
- Type:
list of
FeatureAttributionSequenceOutput
- step_attributions
List containing all step attributions (one per generation step performed on the batch), returned if output_step_attributions=True.
- Type:
list of
FeatureAttributionStepOutput, optional
- info
Dictionary including all available parameters used to perform the attribution.
- Type:
dict with str keys and any values
- aggregate(aggregator: AggregatorPipeline | type[Aggregator] = None, **kwargs) FeatureAttributionOutput[source]
Aggregate the sequence attributions using one or more aggregators.
- Parameters:
aggregator (
AggregatorPipelineorType[Aggregator], optional) β Aggregator or pipeline to use. If not provided, the default aggregator for every sequence attribution is used.- Returns:
Aggregated attribution output
- Return type:
FeatureAttributionOutput
- decode_tokens(tokenizer) FeatureAttributionOutput[source]
Decode tokens in all sequence attributions for human-readable display.
This is especially useful for byte-level tokenizers (e.g., Qwen) where raw vocabulary tokens may be unreadable. Each tokenβs string representation is replaced with the decoded version while preserving the token ID.
- Parameters:
tokenizer β The tokenizer to use for decoding. Should have a decode method that accepts a list of token IDs.
- Returns:
The modified attribution output (for method chaining).
- Return type:
self
Example
>>> out = model.attribute("δ½ ε₯½δΈη") >>> out.decode_tokens(model.tokenizer) >>> print([t.token for t in out[0].source]) ['δ½ ε₯½', 'δΈη'] # Instead of garbled bytes
- get_scores_dicts(aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) list[dict[str, dict[str, dict[str, float]]]][source]
Get all computed scores (attributions and step scores) for all sequences as a list of dictionaries.
- Returns:
List containing one dictionary per sequence. Every dictionary contains the keys βsource_attributionsβ, βtarget_attributionsβ and βstep_scoresβ. For each of these keys, the value is a dictionary with generated tokens as keys, and for values a final dictionary. For βstep_scoresβ, the keys of the final dictionary are the step score ids, and the values are the scores. For βsource_attributionsβ and βtarget_attributionsβ, the keys of the final dictionary are respectively source and target tokens, and the values are the attribution scores.
- Return type:
list(dict)
This output is intended to be easily converted to a pandas DataFrame. The following example produces a list of DataFrames, one for each sequence, matching the source attributions that would be visualized by out.show().
`python dfs = [pd.DataFrame(x["source_attributions"]) for x in out.get_scores_dicts()] `
- static load(path: PathLike, decompress: bool = False) FeatureAttributionOutput[source]
Load saved attribution output into a new
FeatureAttributionOutputobject.- Parameters:
path (
str) β Path to the JSON file containing the saved attribution output. Note that the file must have been saved with thesave()method withuse_primitives=Falsein order to be loaded correctly.decompress (
bool, optional, defaults to False) β If True, the input file is decompressed using gzip.
- Returns:
Loaded attribution output
- Return type:
FeatureAttributionOutput
- save(path: PathLike, overwrite: bool = False, compress: bool = False, ndarray_compact: bool = True, use_primitives: bool = False, split_sequences: bool = False, scores_precision: Literal['float32', 'float16', 'float8'] = 'float32') None[source]
Save class contents to a JSON file.
- Parameters:
path (
os.PathLike) β Path to the folder where the attribution output will be stored (e.g../out.json).overwrite (
bool, optional, defaults to False) β If True, overwrite the file if it exists, raise error otherwise.compress (
bool, optional, defaults to False) β If True, the output file is compressed using gzip. Especially useful for large sequences and granular attributions with umerged hidden dimensions.ndarray_compact (
bool, optional, defaults to True) β If True, the arrays for scores and attributions are stored in a compact b64 format. Otherwise, they are stored as plain lists of floats.use_primitives (
bool, optional, defaults to False) β If True, the output is stored as a list of dictionaries with primitive types (e.g. int, float, str). Note that an attribution saved with this option cannot be loaded with the load method.split_sequences (
bool, optional, defaults to False) β If True, the output is split into multiple files, one per sequence. The file names are generated by appending the sequence index to the given path (e.g../out.jsonwith two sequences ->./out_0.json,./out_1.json)scores_precision (
str, optional, defaults to βfloat32β) β Rounding precision for saved scores. Can be used to reduce space on disk but introduces rounding errors. Can be combined with compress=True for further space reduction. Accepted values: βfloat32β, βfloat16β, or βfloat8β. Default: βfloat32β (no rounding).
- show(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool | None = False, return_figure: bool = False, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | list | None[source]
Visualize the sequence attributions.
- Parameters:
min_val (int, optional) β Minimum value for color scale.
max_val (int, optional) β Maximum value for color scale.
max_show_size (int, optional) β Maximum size of the dimension to show.
show_dim (int or str, optional) β Dimension to show.
slice_dims (dict[int or str, tuple[int, int]], optional) β Dimensions to slice.
display (bool, optional) β If True, display the attribution visualization.
return_html (bool, optional) β If True, return the attribution visualization as HTML.
return_figure (bool, optional) β If True, return the Treescope figure object for further manipulation.
aggregator (
AggregatorPipelineorType[Aggregator], optional) β Aggregator or pipeline to use. If not provided, the default aggregator for every sequence attribution is used.do_aggregation (
bool, optional, defaults to True) β Whether to aggregate the attributions before visualizing them. Allows to skip aggregation if the attributions are already aggregated.
- Returns:
Attribution visualization as HTML if return_html=True list: List of Treescope figure objects if return_figure=True None if return_html=False and return_figure=False
- Return type:
str
- show_granular(min_val: int | None = None, max_val: int | None = None, max_show_size: int | None = None, show_dim: int | str | None = None, slice_dims: dict[int | str, tuple[int, int]] | None = None, display: bool = True, return_html: bool = False, return_figure: bool = False) str | None[source]
Visualizes granular attribution heatmaps in HTML format.
- Parameters:
min_val (
int, optional, defaults to None) β Lower attribution score threshold for color map.max_val (
int, optional, defaults to None) β Upper attribution score threshold for color map.max_show_size (
int, optional, defaults to None) β Maximum dimension size for additional dimensions to be visualized. Default: 20.show_dim (
intorstr, optional, defaults to None) β Dimension to be visualized along with the source and target tokens. Can be either the dimension index or the dimension name. Works only if the dimension size is less than or equal to max_show_size.slice_dims (
dict[int or str, tuple[int, int]], optional, defaults to None) β Dimensions to be sliced and visualized along with the source and target tokens. The dictionary should contain the dimension index or name as the key and the slice range as the value.display (
bool, optional, defaults to True) β Whether to show the output of the visualization function.return_html (
bool, optional, defaults to False) β If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.return_figure (
bool, optional, defaults to False) β If true, returns the Treescope figure object for further manipulation.
- Returns:
Returns the HTML output if return_html=True
- Return type:
str
- show_tokens(min_val: int | None = None, max_val: int | None = None, display: bool = True, return_html: bool = False, return_figure: bool = False, replace_char: dict[str, str] | None = None, wrap_after: int | str | list[str] | tuple[str] | None = None, step_score_highlight: str | None = None, aggregator: AggregatorPipeline | type[Aggregator] = None, do_aggregation: bool = True, **kwargs) str | None[source]
Visualizes token-level attributions in HTML format.
- Parameters:
min_val (
int, optional, defaults to None) β Lower attribution score threshold for color map.max_val (
int, optional, defaults to None) β Upper attribution score threshold for color map.display (
bool, optional, defaults to True) β Whether to show the output of the visualization function.return_html (
bool, optional, defaults to False) β If true, returns the HTML corresponding to the notebook visualization of the attributions in string format, for saving purposes.return_figure (
bool, optional, defaults to False) β If true, returns the Treescope figure object for further manipulation.replace_char (
dict[str, str], optional, defaults to None) β Dictionary mapping strings to be replaced to replacement options, used for cleaning special characters. Default: {}.wrap_after (
intorstrorlist[str]tuple[str]], optional, defaults to None) β Token indices or tokens after which to wrap lines. E.g. 10 = wrap after every 10 tokens, βhiβ = wrap after word hi occurs, [β.β β!β, β?β] or β.!?β = wrap after every sentence-ending punctuation.step_score_highlight (str, optional, defaults to None) β Name of the step score to use to highlight generated tokens in the visualization. If None, no highlights are shown. Default: None.
- class inseq.data.attribution.GranularFeatureAttributionSequenceOutput(source: list[TokenWithId], target: list[TokenWithId], source_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, target_attributions: Float32[Tensor, 'attributed_seq_len generated_seq_len embed_size'] | Float32[Tensor, 'attributed_seq_len generated_seq_len'] | None = None, step_scores: dict[str, Float32[Tensor, 'generated_seq_len']] | None = None, sequence_scores: dict[str, Float32[Tensor, 'attributed_seq_len generated_seq_len']] | None = None, attr_pos_start: int = 0, attr_pos_end: int | None = None, _aggregator: str | list[str] | None = None, _dict_aggregate_fn: dict[str, str] | None = None, _attribution_dim_names: dict[str, dict[int, str]] | None = None, _num_dimensions: int | None = None)[source]
Raw output of a single sequence of granular feature attribution.
An example of granular feature attribution methods are gradient-based attribution methods such as Integrated Gradients, returning one score per hidden dimension of the model for every generated token.
Adds the convergence delta and default L2 + normalization merging of attributions to the base class.
- class inseq.data.attribution.GranularFeatureAttributionStepOutput(source_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, step_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size']] | None = None, target_attributions: ~jaxtyping.Float[Tensor, 'batch_size seq_len embed_size'] | ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len'] | None = None, sequence_scores: dict[str, ~jaxtyping.Float32[Tensor, 'batch_size attributed_seq_len']] | None = None, source: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, prefix: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, target: ~collections.abc.Sequence[~collections.abc.Sequence[~inseq.utils.typing.TokenWithId]] | None = None, _num_dimensions: int | None = None, _sequence_cls: type[~inseq.data.attribution.FeatureAttributionSequenceOutput] = <class 'inseq.data.attribution.GranularFeatureAttributionSequenceOutput'>)[source]
Raw output of a single step of gradient feature attribution.