EPD pipeline explained¶
This page explains what the materia_epd.epd.pipeline module does and
why its steps are organised the way they are. It is meant as an
explanation rather than a step‑by‑step guide or API reference.
High‑level goals¶
The EPD pipeline turns:
a collection of ILCD XML files describing EPDs (environmental product declarations), and
a collection of ILCD XML files describing generic processes
into:
synthesized market‑representative EPDs per process and country,
aggregated impact indicators (e.g. GWP) per market,
and updated ILCD process/flow files written to an output directory.
Conceptually, the pipeline answers:
Which EPDs are relevant for this process?
How can we reconcile differences in units, locations and materials?
What is the “average” material and environmental impact for a market?
Conceptual overview of the module¶
The main concepts in pipeline.py are:
XML readers: turn ILCD XML files into Python objects.
filters: decide whether a given EPD is relevant for a process.
location escalation: progressively relax geographic constraints if no exact match is found.
averaging: compute representative material properties and impacts.
orchestration: connect all previous pieces over a folder tree.
The pipeline now also supports an assembled-products recipe where impacts are computed from precomputed component products as a quantity-weighted sum-product (e.g. cement + aggregates + water + additives for concrete).
The conceptual data‑flow looks like this:
flowchart TD
A[EPD XML files folder ] -->|parse| B[IlcdProcess EPDs]
C[Generic process XML files folder] -->|parse & enrich| D[IlcdProcess processes]
D -->|for each process with matches| E[epd_pipeline]
B --> E
E --> F[Avg. materialproperties]
E --> G[Market‑weightedimpacts per country]
F --> H[Write updatedprocess XML]
F --> I[Write updatedflow XML]
G --> H
XML object generation¶
Two small generators define how XML is brought into the pipeline:
gen_xml_objectstakes a file or folder path and yields(path, xml_root)pairs.gen_epdswrapsgen_xml_objectsand returnsIlcdProcessinstances representing individual EPDs.
These functions are intentionally low‑level: they abstract file iteration and parsing but do not decide anything about relevance or aggregation.
Matches file format¶
Each process may have a matches file at:
<dataset>/matches/<process_uuid>.json
The type field decides which recipe is selected.
1) Average / market-average recipes¶
For EPD-based aggregation, provide a list of source EPD UUIDs in uuids.
{
"type": "average",
"uuids": [
"epd-uuid-1",
"epd-uuid-2",
"epd-uuid-3"
]
}
{
"type": "market-average",
"uuids": [
"epd-uuid-1",
"epd-uuid-2"
]
}
2) Assembled recipe¶
For assembled products, provide components instead of uuids. Each
component references a process UUID that must already have computed outputs
available in the current run.
{
"type": "assembled",
"components": [
{
"process_uuid": "generic-cement-process-uuid",
"quantity": 300.0,
"unit": "kg"
},
{
"process_uuid": "generic-aggregate-process-uuid",
"quantity": 1800.0,
"unit": "kg"
},
{
"process_uuid": "generic-water-process-uuid",
"quantity": 180.0,
"unit": "kg"
}
]
}
Filtering and location escalation¶
Filtering logic is split into composable parts:
gen_filtered_epds(epds, filters): yields only EPDs for which all filters match (logical AND).Filters are instances like:
UUIDFilter– selects only EPDs that are explicitly matched to a process (via UUIDs fromprocess.matches).UnitConformityFilter– ensures the EPD’s declared unit is compatible with the process’ material quantity description (process.material_kwargs).LocationFilter– constrains the EPD to certain geographic locations/countries.
gen_locfiltered_epds builds on top of this to implement location
escalation: if no EPD is found for the requested locations, it repeatedly
relaxes the location set using escalate_location_set until either:
at least one EPD matches, or
a maximum number of attempts is reached, in which case a
NoMatchingEPDErroris raised.
This design separates “what we want” (filters) from “how hard we try to get it” (escalation strategy).
The escalation behaviour can be seen conceptually as:
flowchart TD
S[Requested locations] --> L1[Try exactmatches]
L1 -->|no EPDs| L2[Escalate tobroader regions]
L2 -->|no EPDs| L3[Escalate again e.g. EU, global]
L3 -->|no EPDs after N attempts| E[NoMatchingEPDError]
L1 -->|EPDs found| R[Use matchingEPDs]
L2 -->|EPDs found| R
L3 -->|EPDs found| R
The epd_pipeline function¶
epd_pipeline(process, path_to_epd_folder) is the core conceptual unit
of the module. For a single generic process, it:
Collects candidate EPDs
Parses EPD XML files from
path_to_epd_folder.Builds an initial filter list based on:
process.matches(linked EPD UUIDs) →UUIDFilter.process.material_kwargs(functional unit description) →UnitConformityFilter.
Attempts matching in the process’ declared unit
Applies the filters using
gen_filtered_epds.If no EPD matches, the pipeline conceptually concludes that the process’ declared unit is too specific.
Fallback to mass‑based functional unit
Logs a warning that the functional unit is being switched to a mass‑based one (using
MASS_KWARGS).Replaces the
UnitConformityFilteraccordingly.Re‑evaluates the EPDs with the new unit assumptions.
If there are still no EPDs, the pipeline returns
(None, None)as a signal that this process cannot be handled.
This step encodes a design decision: mass is the ultimate fallback quantity when other, more specific functional units cannot be matched.
Compute LCIA results for each selected EPD
For every filtered EPD, the pipeline requests its life‑cycle impact assessment (LCIA) results via
epd.get_lcia_results().At this stage, the focus is on per‑EPD impacts, not yet on markets.
Average material properties across EPDs
average_material_properties(filtered_epds)computes an average material description (e.g. density, composition).This is wrapped in a
Materialobject, which is then rescaled to the process’ functional unit (mat.rescale(process.material_kwargs)).The result is a single, representative average material for the process.
Build markets and aggregate impacts
For each country in
process.market, the pipeline selects location‑ appropriate EPDs usinggen_locfiltered_epdsandLocationFilter.For each country,
average_impactscomputes an average LCIA result from the selected EPDs.weighted_averages(process.market, market_impacts)then combines the per‑country impacts into market‑weighted global warming potentials (GWPs) (or other indicators, depending on configuration).
Return conceptual outputs
avg_properties– a dictionary of average material properties,avg_gwps– weighted average impacts for the market.
Conceptually, epd_pipeline moves from raw EPDs to a
market‑representative material and impact profile for a single process.
Orchestration via run_materia¶
While epd_pipeline encapsulates the logic for one process,
run_materia(path_to_gen_folder, path_to_epd_folder, output_path) explains
how the whole folder tree is traversed and updated:
It first copies the generic ILCD structure from
path_to_gen_foldertooutput_path, excluding folders that will be regenerated or are not required ("processes","processes_old","flows").It then iterates over each generic process XML in
path_to_gen_folder / "processes":builds an
IlcdProcessinstance,enriches it with reference flow, declared unit, HS class, market and EPD matches.
For each process that has at least one match:
it calls
epd_pipelineto obtainavg_propertiesandavg_gwps,if those are
None, it logs that the process cannot be completed,otherwise, it:
constructs a
Materialfromavg_properties,writes an updated process file (embedding the aggregated impacts),
writes a flow file describing the averaged material,
logs successful completion.
run_materia is responsible for:
scaling up the per‑process logic of
epd_pipelineto an entire dataset,keeping file system structure consistent between input and output,
and providing progress feedback to users.
How the pieces fit together¶
Putting everything together, the conceptual control‑flow looks like:
flowchart TD
subgraph Input
G[Generic processes XML in gen/processes]
E[EPDs XML in epd/processes]
end
subgraph Pipeline
R[run_materia]
P[epd_pipeline per process]
F1[Filtering &unit conformity]
F2[Locationescalation]
A1[Avg. materialproperties]
A2[Market‑weightedimpacts]
end
subgraph Output
O1[Updated process XML]
O2[Updated flow XML]
end
G --> R
R -->|for each matched process| P
E --> P
P --> F1 --> F2 --> A1 --> A2
A1 --> O2
A2 --> O1
TL;DR¶
The pipeline treats EPDs as evidence that is filtered and aggregated to construct a representative, market‑specific view of a material.
Unit conformity and location escalation are complementary strategies to make heterogeneous datasets usable without silently discarding too much information.
run_materiaprovides the bridge between these abstract ideas and a concrete ILCD folder structure, but the conceptual heart of the system is the combination of filters, escalation, and averaging inepd_pipeline.