Core concepts
Idiology
This section will describe the core ideas of Aligned, and explain why it differs to other approaches.
Describe What not How
The core idea around aligned
is that we start with our models high level business goals, and do not care about how the model works.
This also apply to all of our data, as we describe which data and technologies we have to work with, not how they will be glued together.
In other words aligned
is a declerative package describing how your ML system behaves, and they relate to each other.
An Example
To show case this a bit clearer, let's look at an example.
@model_contract(
input_features=[
review_embedding.embedding,
],
exposed_model=mlflow_server(
host="http://movie-review-is-negative:8080",
model_name="movie_review_is_negative",
model_alias="champion",
),
output_source=FileSource.csv_at("preds.csv")
)
class MovieReviewIsNegative:
review_id = String().as_entity()
predicted_sentiment = review.is_negative.as_classification_label()
The below describes a model that predicts if a review is positive or negative (predicted_sentiment
), and each prediction have a review_id
associated with it.
Furthermore, we describe that the review.is_negative
is the ground truth that the model should learn from.
This is predicted based on the data existing at review_embedding.embedding
.
We can use the model by leveraging an mlflow
server at http://movie-review-is-negative:8080
.
Lastly, we are going to store our predictions in a CSV format at preds.csv
.
Free features
Therefore, since we describe this in a declerative way do we get all of the following features without writing anything else.
- Automatic joining of ground truths to training datasets
- Use the model -
store.model("...").predict_over(...)
- Automatic read, write and upserts to the prediction source -
store.model("...").all_predictions()
- Data validation of prediction formats
- Data type conversion - like iso or unix timestamps
- Data lineage for models and features
- Online model performance monitoring