Methodology
At Electricity Maps, we’re data scientists, first and foremost.
Data comes in from many sources, and in many formats. We ingest and harmonize it, apply our models to it, and make it available to the world. This is the place to learn more about our data; read FAQs, or deep dive in our methodology.
Frequently Asked Questions
How good is your data?
What Emission Factors do you use?
How do you provide live data?
What time horizons do you cover?
Are your numbers different from other sources of electricity data?
How do you calculate the grid carbon intensity? What emission factors do you use?
Do you provide electricity price as a signal?
Is your data auditable?
How are you forecasting your signals?
How often does your data get updated?
How long until the data is final?
Historical & Real Time
First, let’s have a look at the Historical time frame
I. Foundational Methodological Choices
For accurate and verifiable data that most closely represents the physical reality of electricity grids, Electricity Maps employs a robust, triple-layered methodological choice: our data is attributional, location-based, and consumption-based.
Attributional Accounting Approach
We align with international standards, such as the GHG Protocol Scope 2 Guidance, which track, GHG emissions and removals within a defined organizational and operational boundary over time. It is the primary method, required by regulation and standards, to report on companies and individuals emissions.
Location Based Method
Our data reflects the physical reality of the grid. A location-based method considers the electricity available on grids where energy consumption occurs and does not include contracts or certificates traded.
Consumption-based Calculation
We provide grid signals (electricity mix, carbon intensity, ...) for the electricity available (or consumed) in a grid, rather than merely what was produced locally. This crucial distinction mandates accounting for electricity flows across grids, which is achieved through our flow-tracing algorithm.
II. Defining Granularity - Spatial and Temporal
To offer actionable data, we support different spatial and temporal aggregations on top of the highest granular data.
Spatial Granularity: Our spatial units represent a physical network that connects generators to consumers. They typically correspond to an electricity grid controlled by a single responsible operator. We aim to display the smallest subdivision of electricity grids for which reliable data is available, ensuring the highest accuracy. We also provide data aggregated at a country level.
Temporal Granularity: All our data can be delivered with a 5-minute, 15-minute, and hourly granularity to ensure the highest temporal fidelity and accuracy. We also provide data aggregated daily, monthly, quarterly, and yearly.
III. Ingestion: Parser System
High-quality data starts with reliable sourcing, and mandatory standardization.
Trusted Data Acquisition: We prioritize obtaining data from the highest-quality, most credible organizations globally, including government agencies (like the EIA in the US), Technical Bodies (like ENTSO-E in Europe), Nominated Electricity Market Operators (like Nordpool), Transmission System Operators (TSOs like RTE in France and Energinet in Denmark), and large utility companies. Currently, we have 75 active parsers for real-time electricity mix data and 38 active parsers for exchange data.
Multiple Time Frequencies: We integrate with data sources that support different time granularities. Some parsers run with high frequency to ingest hourly or more granular data, while others run less frequently and ingest monthly or yearly data.
The Parser System: We use an open-source parser system to ingest raw data and transform it into a standardized format. This critical step maps disparate raw data inputs (e.g., ENTSO-E's 21 specific modes) into our fixed, harmonized set of 12 distinct production modes. This standardization ensures consistency and comparability across all global zones.
ENTSO-E example
Disparate raw data
Harmonized Set
Fossil Brown Coal / Lignite
Fossil Hard Coal
Fossil Oil Shale
Fossil Peat
Coal
Fossil Oil
Oil
Fossil Coal-derived Gas
Fossil Gas
Gas
Geothermal
Geothermal
Solar
Solar
Hydro run-of-river & poundage
Hydro water reservoir
Hydro
Hydro Pumped Storage
Hydro Storage
Wind Offshore
Wind Onshore
Wind
Biomass
Waste
Biomass
Energy Storage
Battery Storage
Nuclear
Nuclear
Marine
Other
Other renewable
Unknown
event
validation
gas
wind
hydro
Capacity
Expected modes
Range mode
Zero production
Range total
state
1 - valid datapoint
0 - invalid datapoint
event
net flow
Capacity
Range
IQR
state
1 - valid datapoint
0 - invalid datapoint
Electricity Maps Emission Factors (EF)
Default
US regional EFs
European regional EFs
VII. The Refetching Policy for Definitive Accuracy
Real-time data sources often consolidate, adjust, or finalize their initial readings over time, meaning the instant real-time value is often preliminary. To ensure that Electricity Maps provides the most accurate primary data possible, we implement an automatic refetching policy.
Refetch Schedule: Once per day, we refetch data covering a backwards looking 48-hour period for the current day, a week, a month, and three months in the past.
Impact of Refetching: This systematic process ensures that we capture source updates. While significant changes can occur immediately, data stabilizes rapidly. For most zones, the magnitude of updates to the Renewable Energy Percentage (RE%) considerably decreases after six hours. For 50% of zones, the RE% value can be considered definitive (updates of less than 0.5 percentage points) after 72 hours. For 90% of zones, updates after 72 hours are less than 2 percentage points different compared to the real time values.
3 Months
1 Month
1 Week
Now
VIII. Strategic Estimation to Guarantee Global Coverage
Data gaps—due to invalid points, delays, or sparse reporting—must be filled to provide complete, continuous, granular data. We manage this through a tiered system based on data availability.
Tier A Zones
High granularity: hourly or better measured data
Original data source
Gaps filled using Time Slicer Average or Forecast Hierarchy
These zones have measured hourly or better data available for the full electricity mix from the original source. Any potential gaps here are filled using Time Slicer Average (TSA) or Forecast Hierarchy estimation models. TSA is efficient for immediate gap filling, as it operates without a dedicated training phase, and maintaining continuity. Forecast Hierarchy builds on top of TSA and uses external forecast to improve the accuracy of the estimation.
Tier B
Original data source
Missing info estimated using zone-specific est. models
These zones have partial measured hourly or better data available from the original source. Since the full production mix breakdown may be missing, we develop zone-specific estimation models to fill these gaps. These custom models are designed to leverage all measured hourly data available and estimate the missing parts leveraging weather parameters.
Tier C
Limited granularity: monthly or yearly totals
Hourly values modelled with General Purpose Zone Development model
Ensure reconciliation with original data
source on monthly and yearly totals
These zones do not have measured hourly data available, only aggregate monthly or yearly totals. For these regions, hourly values are estimated using the General Purpose Zone Development (GPZD) model. GPZD was specifically developed to provide hourly estimated grid data by breaking down yearly or monthly production figures into plausible hourly estimates, using weather data and geographic information.
IX. Operational Excellence and Incident Management
The continuous delivery of trusted real-time data requires highly structured monitoring and incident response.
Observability Stack: Our alerting is supported by a robust tooling set, including Grafana, Prometheus, Big Query, and Google Cloud Logging and Cloud Monitoring. This setup provides constant visibility into product-wide Service Level Objectives (SLOs).
Incident Response: We use a formal incident management playbook for responding to and resolving incidents in real time. This playbook ensures a fast, structured, and coordinated response when issues arise, supported by an on-call system (via Grafana OnCall and Slack).
Traceability: Every data point is stored with its full history, including the original source or estimation model, ensuring complete data traceability should an auditor need to retrace results independently.
X. Data Publication and Versioning for Audit Readiness
Ensuring data traceability is a key objective. We guarantee that users can access and verify the exact data used for their calculations.
New historical datasets are typically published every January for the previous calendar year. Should major updates or data source improvements occur throughout the year, the data is updated, and these changes are fully versioned. This policy allows users to access and reference previous snapshots, ensuring complete audit readiness and traceability. We maintain a complete data history, tracking the value and origin (source or estimation model) of every data point over time.
Datasets
Year
Version date
Jul 3,
LATEST
Apr 3,
Jan 27th,
XI. Validation Against Global Data Sources
To continually reinforce the trustworthiness of our methodology, our production-based historical data is rigorously validated against highly regarded external sources.
Global Comparison (IEA and Ember): When comparing our production-based Renewable Energy Percentage (RE%) and Carbon-Free Energy Percentage (CFE%) data against worldwide sources like the International Energy Agency (IEA) and Ember (which do not include electricity flows), we find a strong correlation (0.99 for RE%). Across 59 countries, the median absolute difference for RE% data against both Ember and IEA remained below 3.2 percentage points for 2023. Similarly, the CFE% comparison shows consistency, with the median absolute differences remaining low.
Regional Comparison (Eurostat): Validation against regional authoritative sources, such as Eurostat (the statistical office of the European Union), also confirms consistency. For the 33 countries compared in 2022, the median absolute difference for RE% was 2.4 percentage points, demonstrating that our data is consistent with Eurostat's authoritative figures.
2.9 pp
2.1 pp
Median
-1.1 pp
0.3 pp
Electricity Maps data for the CFE% over 2023 is consistent with values provided by EMBER and the IEA.
XII. How Electricity Maps provides Real-Time data
In reality, even the best public sources (from TSOs in Europe, for example), only provide data with a slight delay, because they report what happened in the last reporting interval. To this, we have to add that there are technical delays in delivering this data.
So how can Electricity Maps provide real-time data for decision-making? If you’ve been looking carefully through our map, you will have noticed that the real-time view uses two labels: Preliminary, and Synthetic.
"Preliminary" is used when our models estimate the values, but they will be replaced with actual values from the source.
"Synthetic" is used when the sources are not granular enough, or updated often enough, or simply don't exist. In those cases, our models estimate the values, and they will not be replaced with actual values.
XIII. Zone Tiers
Tiers help us categorize zones depending on data quality and availability.
Tier A
We have briefly introduced our tiering system on this page, which categorizes zones into Tier A, B, and C.
Tier A zones are zones with measured hourly or sub-hourly data available. For these zones, we complement our data sources with the Time Slicer Average or Forecast Hierarchy models to guarantee data real-timeness and completeness.
Time Slice Average (TSA) is an estimation method for Tier A zones that fills short gaps or delays in otherwise reliable hourly production data. For every missing timestamp, TSA takes the average of available observations at the same time of day across other days within the same month, then aligns the filled values to ensure continuity with the data immediately before and after the gap.
Forecast Hierarchy is a estimation method for Tier A zones that have external forecasts that can be used to enhance the accuracy of the estimation method. If no external forecasts is available then TSA is used instead for the production mode in question.
This is the data our map labels as “Preliminary”, and our API responds with “Estimated”, along with the Time Slicer Average or Forecast Hierarchy as the estimation method.
Tier B and C zones
Tier B and C zones are zones where we get partial hourly data, and zones with no hourly data at all. Data in these zones will always be estimated to some extent, and will always be labelled as such in the App and in the API.
For Tier B zones, we leverage all hourly data available and use zone-specific models to estimate, on an hourly granularity, the data that is only available at a daily or lower granularity. These models usually leverage time and weather parameters to break down original values into hourly granularity.
For Tier C zones, we have developed a model called General Purpose Zone Development (GPZD) that estimates hourly electricity production by mode in zones where only low-frequency data exists, such as yearly or monthly aggregates. It aims for plausible, smooth hourly profiles that exactly reconcile to reported monthly or yearly totals per mode, prioritising global coverage and stability over perfect accuracy. The model is trained on zones with both hourly and yearly data to learn realistic patterns and then applied where high-frequency data is missing.
It works in two stages: first, it derives monthly production per mode either by using existing monthly data or by disaggregating yearly totals into months using monthly weather signals and capacity bounds. Second, it converts monthly to hourly using hourly weather, geographic cues like sunrise and sunset, capacity limits, and an optimization step that enforces ramping and non-negativity constraints.
This is the data our map labels as “Synthetic”, and our API responds with “Estimated”, along with the corresponding estimation method.
Forecasting
This documentation addresses the methodology ensuring the quality of our forecasts.
I. The Strategic Imperative for Grid Forecasting
As the global energy landscape rapidly electrifies and relies more heavily on highly variable low-carbon sources like solar and wind, anticipating the grid's future state is essential.
Our forecasting engine provides a comprehensive prediction for the future state of grids worldwide, typically spanning up to 72 hours. The goal is to produce accurate and actionable forecasts for all of our signals, including all carbon and pricing signals supported, enabling users to optimize consumption patterns for lower carbon emissions or lower costs.
Forecasted
Biomass
Solar
Wind
Coal
II. Ensuring Coherency through Flow-Tracing Predictions
Forecasting individual grid components (like solar production or net flows) is complex enough, but the greatest challenge is ensuring these thousands of individual predictions result in a physically coherent network state. Since all models are intertwined through flow tracing, changing one forecast (e.g., geothermal production in California) can affect the forecasted grid state in distant, interconnected zones.
We build a physically coherent prediction of the future state of all interconnected grids by applying our fundamental flow-tracing algorithm to each individual production and power flow forecasts (e.g., solar production, nuclear baseline, and net flow predictions). This results in the most accurate prediction of the electricity mix and, subsequently, the future Carbon Intensity (CI) in each grid globally.
US-CAL-CISO
US-CAL-CISO
US-SW-WALC
US-SW-WALC
Flow-Tracing
US-CAL-CISO
US-SW-WALC
III. Scalable Architecture and the General-Purpose Model
The dimensionality of the predictions we make is relatively large: we have to predict about 20 signals, across more than 200 zones, for more than 72 hours horizons. This forces us to avoid hand-crafting models for a particular zone, signal, or horizon, as it hinders our scalability.
Instead, we prefer to iterate on a single general-purpose model that can cope with the varying degrees of availability and robustness of the features it ingests while being robust to many error sources, including those we don’t yet know about.
Depending on the type of forecasts we want to generate, different sets of features will be most relevant. For example, features describing weather patterns are essential to forecast solar power production, while features engineered to provide useful information about the expected future make-up of the power grid are relevant to forecast net flows between regions.
These features can further be pre-processed in a multitude of fashions. Choosing to standardize them or imputing missing values can have a significant impact on the behavior of the predictions.
Parameters
Model
/
Data
Forecast:
IV. Automation, Traceability, and Version Control
To manage the complexity of thousands of interconnected predictions globally, we rely on core engineering principles: automation and guaranteed traceability.
Automated Lifecycle: We automate all operations within our model’s lifecycle, including training, testing, and deployment, to ensure high speed and reliability while avoiding reliance on manual intervention.
Guaranteed Traceability: We have the ability to trace exactly which features, training data, preprocessors and model class that was used for a release.
Version Control Environments: We maintain two distinct environments to manage model deployment risk:
Nightly
Features
Models
Trainers
Latest
Features
Models
Trainers
Nightly: Used for high-risk experimentation and testing new model classes, mainly for internal use.
Latest: The current production version used by commercial services.
When a major model release occurs, a dedicated service promotes the model configurations from Nightly to Latest, and from Latest to Support in a version-controlled system.
V. Monitoring and Scalable Analytics Setup
Trust in forecast data is maintained through dedicated, scalable analytics that continuously monitor model performance.
Scalable Analytics Setup: Our system utilizes BigQuery and Dataform to define and compute complex system metrics. Dataform is crucial as it implements software-engineering best practices (version control and testing) within our analytics engine, ensuring the metrics we report are inherently trustworthy.
Monitoring & Transparency: Forecast metrics and key observability data are exposed via Looker Studio dashboards, which allows internal teams to build confidence in the forecast quality and ensures that the grid forecasts team is not distracted by recurring inquiries. Furthermore, completeness metrics are ingested and integrated into tools like Prometheus and Grafana for continuous health monitoring.
Release
Configuration
Configuration
Exporter
BigQuery
config tracker
Operational
Database
Metrics
Exporter
Prometheus
Grafana
