Examples
Extract Features with an LLM
This describes how you can use an language model to extract structured features out of unstructured data.
This guide demonstrates how to use a language model to extract structured features from raw, unstructured text.
Define the Input Schema
We begin by defining the input schema for our text data. In this example, the input consists of a single string field. However, you can easily extend this schema to include additional information, such as image URLs, metadata, or any other relevant fields.
@feature_view()
class TextDocument:
content = String()
Define the Expected Output
Now, let’s define the output structure we want the model to extract — in this case, name
and age
of any person mentioned in the document.
@model_contract(...)
class Persons:
name = String().is_optional()
age = Int32().is_optional()
Define the Model
Now specify the model to use and the features to send to it. In this example, we use ollama_extraction, which automatically generates a prompt based on the input features and ensures that the output matches the expected schema.
@model_contract(
input_features=[TextDocument],
exposed_model=ollama_extraction(model="mistral:latest")
)
class Persons:
name = String().is_optional()
age = Int32().is_optional()
Use the model
You can now run the extraction using the following code:
store = await ContractStore.from_dir(".")
extracts = await store.model(Persons).predict_over({
"content": [
"Donald Duck is almost 100 years old at this point",
"Rick and morty is only 13 years old"
]
}).to_polars()
print(extracts)
The output will look something like this:
shape: (2, 4)
┌───────────────────────────────────┬────────────────────┬──────────────────────────────────┬──────┐
│ content ┆ name ┆ prompt_output ┆ age │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 │
╞═══════════════════════════════════╪════════════════════╪══════════════════════════════════╪══════╡
│ Donald Duck is almost 100 year… ┆ Donald Duck ┆ {"name": "Donald Duck", "age"… ┆ 99 │
│ Rick and Morty is only 13 year… ┆ Rick and Morty ┆ {"age": 13, "name": "Rick and… ┆ 13 │
└───────────────────────────────────┴────────────────────┴──────────────────────────────────┴──────┘
Extracting Multiple Entities
In some cases, you may want the model to extract multiple persons from a single text — for example, extracting Rick and Morty as separate entities rather than a single entry.
To do this, we can modify the output schema to return a list of Person
objects:
Schema Definitions
Aligned supports schema definitions using pydantic.BaseModel
, dataclass
, and feature_view
s. These can be nested using the Struct
data type for complex extractions.
@feature_view()
class Person:
name = String().is_optional()
age = Int32().is_optional()
@model_contract(
input_features=[TextDocument],
exposed_model=ollama_extraction(model="mistral:latest")
)
class Persons:
persons = List(Struct(Person))
With this updated schema, the output might look like this:
shape: (2, 3)
┌─────────────────────────────────┬─────────────────────────────┬─────────────────────────────────┐
│ content ┆ persons ┆ prompt_output │
│ --- ┆ --- ┆ --- │
│ str ┆ list[struct[2]] ┆ str │
╞═════════════════════════════════╪═════════════════════════════╪═════════════════════════════════╡
│ Donald Duck is almost 100 year… ┆ [{"Donald Duck",99}] ┆ {"persons": [{"name": "Donald … │
│ Rick and Morty is only 13 year… ┆ [{"Rick",13}, {"Morty",13}] ┆ {"persons": [{"name": "Rick", … │
└─────────────────────────────────┴─────────────────────────────┴─────────────────────────────────┘
Full Code Example
Bellow will you find the full code example.
@feature_view()
class TextDocument:
content = String()
@feature_view()
class Person:
name = String().is_optional()
age = Int32().is_optional()
@model_contract(
input_features=[TextDocument],
exposed_model=ollama_extraction(model="mistral:latest")
)
class Persons:
persons = List(Struct(Person))
async def use_model():
store = await ContractStore.from_dir(".")
extracts = await store.model(Persons).predict_over({
"content": [
"Donald Duck is almost 100 years old at this point",
"Rick and morty is only 13 years old"
]
}).to_polars()
print(extracts)