Core concepts
Transformations
Even though a feature contains the raw inputs, will a derived feature describe a feature that depends on any other features but with a transformation.
One such example could be to add two integers together, or to extract the host name in an URL. Bellow shows how we can define transformations.
from aligned import String, Int64, EventTimestamp, FileSource, feature_view
@feature_view(
name="zipcode_features",
description="",
batch_source=FileSource.parquet_at("zipcodes.parquet"),
)
class Zipcode:
zipcode = Int64().as_entity()
location_type = String()
tax_returns_filed = Int64()
population = Int64()
total_wages = Int64()
is_primary_location = location_type == "PRIMARY"
contains_secondary = location_type.contains("SECONDARY")
total_income = total_wages + tax_returns_filed
Because we define transformations is it importent to understand how they are dependend on each other. Otherwise will the end user need to know this them self and fetch the dependend data them self.
Therefore, Aligned thankfully collects this as well. So when you write total_income = total_wages + tax_returns_filed
will Aligned know that when you ask for total_income
will we first need to load total_wages
, and tax_returns_filed
, and at last sum those two together.
Furthermore, this means we have data lineage information for other use-cases as well.
Encoding
This also means that all the transformations between features, can be represented as a JSON object. For instance, a subtraction transformation, can be represented as the two names of the columns that should be subtracted. Representing transformations in such a format is core to the solution, as it makes it possible to derive the whole data lineage, without relying on any specific technology. Therefore, computations can be optimised and increases developer flexibility, as developers will not be locked into any processing engines.