API Reference
This document provides details on the main functions and classes available in the fast-seqfunc package.
Core Functions
train_model
from fast_seqfunc import train_model
model_info = train_model(
train_data,
val_data=None,
test_data=None,
sequence_col="sequence",
target_col="function",
embedding_method="one-hot",
model_type="regression",
optimization_metric=None,
**kwargs
)
Trains a sequence-function model using PyCaret.
Parameters:
train_data: DataFrame or path to CSV file with training data.val_data: Optional validation data (not directly used, reserved for future).test_data: Optional test data for final evaluation.sequence_col: Column name containing sequences.target_col: Column name containing target values.embedding_method: Method to use for embedding sequences. Currently only "one-hot" is supported.model_type: Type of modeling problem ("regression" or "classification").optimization_metric: Metric to optimize during model selection (e.g., "r2", "accuracy", "f1").**kwargs: Additional arguments passed to PyCaret setup.
Returns:
- Dictionary containing the trained model and related metadata.
predict
from fast_seqfunc import predict
predictions = predict(
model_info,
sequences,
sequence_col="sequence"
)
Generates predictions for new sequences using a trained model.
Parameters:
model_info: Dictionary fromtrain_modelcontaining model and related information.sequences: Sequences to predict (list, Series, or DataFrame).sequence_col: Column name in DataFrame containing sequences.
Returns:
- Array of predictions.
save_model
from fast_seqfunc import save_model
save_model(model_info, path)
Saves the model to disk.
Parameters:
model_info: Dictionary containing model and related information.path: Path to save the model.
Returns:
- None
load_model
from fast_seqfunc import load_model
model_info = load_model(path)
Loads a trained model from disk.
Parameters:
path: Path to saved model file.
Returns:
- Dictionary containing the model and related information.
Embedder Classes
OneHotEmbedder
from fast_seqfunc.embedders import OneHotEmbedder
embedder = OneHotEmbedder(sequence_type="auto")
embeddings = embedder.fit_transform(sequences)
One-hot encoding for protein or nucleotide sequences.
Parameters:
sequence_type: Type of sequences to encode ("protein", "dna", "rna", or "auto").
Methods:
fit(sequences): Determine alphabet and set up the embedder.transform(sequences): Transform sequences to one-hot encodings.fit_transform(sequences): Fit and transform in one step.
Helper Functions
get_embedder
from fast_seqfunc.embedders import get_embedder
embedder = get_embedder(method="one-hot")
Get an embedder instance based on method name.
Parameters:
method: Embedding method (currently only "one-hot" is supported).
Returns:
- Configured embedder instance.
evaluate_model
from fast_seqfunc.core import evaluate_model
results = evaluate_model(
model,
X_test,
y_test,
embedder,
model_type,
embed_cols
)
Evaluate model performance on test data.
Parameters:
model: Trained model.X_test: Test sequences.y_test: True target values.embedder: Embedder to transform sequences.model_type: Type of model (regression or classification).embed_cols: Column names for embedded features.
Returns:
- Dictionary containing metrics and prediction data with structure:
{ "metrics": {metric_name: value, ...}, "predictions_data": { "y_true": [...], "y_pred": [...] } }
save_detailed_metrics
from fast_seqfunc.core import save_detailed_metrics
save_detailed_metrics(
metrics_data,
output_dir,
model_type,
embedding_method="unknown"
)
Save detailed model metrics to files in the specified directory.
Parameters:
metrics_data: Dictionary containing metrics and prediction data fromevaluate_model.output_dir: Directory to save metrics files.model_type: Type of model (regression or classification).embedding_method: Embedding method used for this model.
Returns:
- None
Output Files:
- JSON file with detailed metrics
- CSV file with raw predictions and true values
- Visualization plots based on model type:
- For regression: scatter plot, residual plot
- For classification: confusion matrix