API Reference
This document provides details on the main functions and classes available in the fast-seqfunc
package.
Core Functions
train_model
from fast_seqfunc import train_model
model_info = train_model(
train_data,
val_data=None,
test_data=None,
sequence_col="sequence",
target_col="function",
embedding_method="one-hot",
model_type="regression",
optimization_metric=None,
**kwargs
)
Trains a sequence-function model using PyCaret.
Parameters:
train_data
: DataFrame or path to CSV file with training data.val_data
: Optional validation data (not directly used, reserved for future).test_data
: Optional test data for final evaluation.sequence_col
: Column name containing sequences.target_col
: Column name containing target values.embedding_method
: Method to use for embedding sequences. Currently only "one-hot" is supported.model_type
: Type of modeling problem ("regression" or "classification").optimization_metric
: Metric to optimize during model selection (e.g., "r2", "accuracy", "f1").**kwargs
: Additional arguments passed to PyCaret setup.
Returns:
- Dictionary containing the trained model and related metadata.
predict
from fast_seqfunc import predict
predictions = predict(
model_info,
sequences,
sequence_col="sequence"
)
Generates predictions for new sequences using a trained model.
Parameters:
model_info
: Dictionary fromtrain_model
containing model and related information.sequences
: Sequences to predict (list, Series, or DataFrame).sequence_col
: Column name in DataFrame containing sequences.
Returns:
- Array of predictions.
save_model
from fast_seqfunc import save_model
save_model(model_info, path)
Saves the model to disk.
Parameters:
model_info
: Dictionary containing model and related information.path
: Path to save the model.
Returns:
- None
load_model
from fast_seqfunc import load_model
model_info = load_model(path)
Loads a trained model from disk.
Parameters:
path
: Path to saved model file.
Returns:
- Dictionary containing the model and related information.
Embedder Classes
OneHotEmbedder
from fast_seqfunc.embedders import OneHotEmbedder
embedder = OneHotEmbedder(sequence_type="auto")
embeddings = embedder.fit_transform(sequences)
One-hot encoding for protein or nucleotide sequences.
Parameters:
sequence_type
: Type of sequences to encode ("protein", "dna", "rna", or "auto").
Methods:
fit(sequences)
: Determine alphabet and set up the embedder.transform(sequences)
: Transform sequences to one-hot encodings.fit_transform(sequences)
: Fit and transform in one step.
Helper Functions
get_embedder
from fast_seqfunc.embedders import get_embedder
embedder = get_embedder(method="one-hot")
Get an embedder instance based on method name.
Parameters:
method
: Embedding method (currently only "one-hot" is supported).
Returns:
- Configured embedder instance.
evaluate_model
from fast_seqfunc.core import evaluate_model
results = evaluate_model(
model,
X_test,
y_test,
embedder,
model_type,
embed_cols
)
Evaluate model performance on test data.
Parameters:
model
: Trained model.X_test
: Test sequences.y_test
: True target values.embedder
: Embedder to transform sequences.model_type
: Type of model (regression or classification).embed_cols
: Column names for embedded features.
Returns:
- Dictionary containing metrics and prediction data with structure:
{ "metrics": {metric_name: value, ...}, "predictions_data": { "y_true": [...], "y_pred": [...] } }
save_detailed_metrics
from fast_seqfunc.core import save_detailed_metrics
save_detailed_metrics(
metrics_data,
output_dir,
model_type,
embedding_method="unknown"
)
Save detailed model metrics to files in the specified directory.
Parameters:
metrics_data
: Dictionary containing metrics and prediction data fromevaluate_model
.output_dir
: Directory to save metrics files.model_type
: Type of model (regression or classification).embedding_method
: Embedding method used for this model.
Returns:
- None
Output Files:
- JSON file with detailed metrics
- CSV file with raw predictions and true values
- Visualization plots based on model type:
- For regression: scatter plot, residual plot
- For classification: confusion matrix