3 Data delivery for Climate TRACE and OceanMind

3.0.1 AIS-based emissions model estimates

3.0.1.1 Monthly asset-level data

Using the Climate TRACE and OceanMind asset-level (i.e., vessel level) data schema, we currently produce monthly datasets of emissions for both domestic and international activity including voyages and port visits. Each dataset contains rows for voyage-level emissions and port-stay emissions for voyages and port-stays that ended during that month. We currently produce monthly datasets from ‘2015-01-01’ through ‘2026-05-31’.

We save monthly CSVs of domestic activity in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/domestic-shipping-ship/YYYYMM/GFW/asset-climate-trace_domestic-shipping-ship_GFW_MMDDYY_*.csv, where the folder name YYYYMM represents the delivery month of the data, and the MMDDYY in the file name represents the month of the data included in each CSV. Each month of data is split into multiple files that each have a corresponding number which replaces the * wildcard operator. This approach is used since there are hard limits that restrict how big files can be when they are exported from Google Big Query to Google Cloud storage; large files must be split into smaller files. From Google’s website, “The wildcard operator is replaced with a number (starting at 0), left-padded to 12 digits. For example, a URI with a wildcard at the end of the filename would create files with000000000000appended to the first file, 000000000001 appended to the second file, and so on.”

We save monthly CSVs of international activity in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/international-shipping-ship/YYYYMM/GFW/asset-climate-trace_international-shipping-ship_GFW_MMDDYY_*.csv, where the folder name YYYYMM represents the delivery month of the data, and the MMDDYY in the file name represents the month of the data included in each CSV. Each month of data is split into multiple files that each have a corresponding number which replaces the * wildcard operator.

3.0.1.2 Confidence and uncertainty

For all data entering into Climate TRACE platform, it is important to document both the confidence and uncertainty with each individual data attribute. According to the Climate TRACE definitions, confidence is a qualitative measure of how well the data are understood and how trustworthy they are (i.e., very high, high, medium, low, very low). To assign these qualitative categories, we follow the general recommendations proposed by Climate TRACE, which take into account multiple considerations including whether the data are directly observed or generated by a model, whether or not the data are self-reported, whether self-reported data are incentive compatible (i.e., the asset-owner has little reason to provide false information), whether or not the data can be validated, whether or not the data can be corroborated across multiple sources, etc. Confidence is assigned differently for two sub-groups of vessels: 1) those vessels that have known vessel characteristics from official vessel registries (which must include all of known vessel type, main engine power, gross tonnage, and length); and 2) those vessels that do not have all of these known vessel characteristics from official vessels registries, and for which we use the GFW vessel characteristics machine learning algorithm to generate any missing characteristics. We will call these sub-groups “high-information vessels” and “low-information vessels.” For “high-information vessels” where vessel characteristics are obtained from a registry, we assign a confidence value of “very high” to vessel characteristics (vessel type, capacity, and capacity factor). For “low-information” vessels where vessel characteristics are obtained through an algorithm, we assign a value of “low” to vessel characteristic attributes (vessel type, capacity, and capacity factor). Since emissions estimates depend on certain vessel characteristic metadata, vessels with higher confidence vessel characteristics will naturally have higher confidence emissions estimates.

Generally speaking, we have very high confidence in the AIS spatiotemporal data itself, since it is in essence directly observed, so we assign a value of “very high” to the activity attribute. While the bottom-up engineering emissions estimation methodology uses these directly observed AIS data, and while the approach is based on the published ICCT and IMO methodology, it is still a model and does not represent directly observed emissions data. We therefore generally classify emissions estimates from high-information vessels as “medium”, and emissions estimates from low-information vessels as “very low”. We provide confidence estimates for each vessel by month.

For the uncertainty column, following the recommendations by CT and to be consistent with the OM methodology, for now we assign “Standard deviation” to all numeric attributes. Standard deviation for each of these attributes is calculated by vessel, across both port visits and voyages, for port visits and voyages that end in each year of the dataset. In cases where there was only one observation, the standard deviation was set to 0. However, we believe an interesting and important area of future work could be to think more about what the most appropriate uncertainty measures are for each attribute. While standard deviation provides information on the overall distribution of any given data attribute across the entire dataset, it does not provide information on the uncertainty of that attribute for any given vessel, voyage, or port-visit. Ultimately, the most appropriate measure of uncertainty, and the scale at which to measure it, will depend on how the emissions estimates will be used by various actors and for different interventions.

We save the domestic confidence dataset in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/domestic-shipping-ship/YYYYMM/confidence-climate-trace_international-shipping-ship_mmyy_GFW_MMDDYY.csv, where the folder YYYYMM represents the delivery date of the data, mmyy represents the time period corresponding to the data (each confidence CSV represents a single month of activities that ended in that month, so these numbers will the month and year of the data), and the suffix on the file name _MMDDYY.csv represents the delivery date of each CSV to GCS. We follow the Climate TRACE and OceanMind confidence data schema.

We save the international confidence dataset in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/international-shipping-ship/YYYYMM/confidence-climate-trace_international-shipping-ship_mmyy_GFW_MMDDYY.csv, where the folder YYYYMM represents the delivery date of the data, mmyy represents the time period corresponding to the data (each confidence CSV represents a single month of activities that ended in that month, so these numbers will the month and year of the data), and the suffix on the file name _MMDDYY.csv represents the delivery date of each CSV to GCS. We follow the Climate TRACE and OceanMind confidence data schema.

We save the domestic uncertainty dataset in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/domestic-shipping-ship/YYYYMM/uncertainty-climate-trace_international-shipping-ship_mmyy_GFW_MMDDYY.csv, where the folder YYYYMM represents the delivery date of the data, mmyy represents the time period corresponding to the data (each uncertainty CSV represents all activities that ended across an entire year, so these numbers will show 01/January as the month along with the appropriate year), and the suffix on the file name _MMDDYY.csv represents the delivery date of each CSV to GCS. We follow the Climate TRACE and OceanMind uncertainty data schema.

We save the international uncertainty dataset in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/international-shipping-ship/YYYYMM/uncertainty-climate-trace_international-shipping-ship_mmyy_GFW_MMDDYY.csv, where the folder YYYYMM represents the delivery date of the data, mmyy represents the time period corresponding to the data (each uncertainty CSV represents all activities that ended across an entire year, so these numbers will show 01/January as the month along with the appropriate year), and the suffix on the file name _MMDDYY.csv represents the delivery date of each CSV to GCS. We follow the Climate TRACE and OceanMind uncertainty data schema.

Confidence and uncertainty for each data attribute for the AIS-based emissions model
Data attribute	Confidence	Uncertainty
type	Very high (high-information vessels) Low (low-information vessels)	NULL
capacity	Very high (high-information vessels) Low (low-information vessels)	Standard deviation
capacity_factor	Very high (high-information vessels) Low (low-information vessels)	Standard deviation
activity	Very high	Standard deviation
CO2_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
CH4_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
N2O_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
SOX_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
NOX_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
VOCS_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
PM 2_5_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
PM10_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
CO_emissions_factor	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
CO2_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
CH4_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
N2O_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
SOX_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
NOX_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
VOCS_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
PM2_5_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
PM10_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
CO_emissions	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
total_CO2e_100yrGWP	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation
total_CO2e_20yrGWP	Medium (high-information vessels) Very low (low-information vessels)	Standard deviation

3.0.2 Dark fleet emissions model estimates

Using the Climate TRACE country-level data schema, we currently produce a single dark fleet dataset where each row is a year, and there are columns for emissions from the dark fleet for each of the pollutants. All units in this table are in metric tonnes. We currently produce data for 2016 through 2023. The dataset currently uses Sentinel-1 satellite data, as described in ?sec-s1. These data provide insight into previously undisclosed and unmeasured emissions, and can therefore directly inform Climate Trace’s known gaps estimates for the shipping sector. Note that since these estimates are for the entire globe, and not differentiated by country, we put GLOBAL in the iso3_country column.

We save this CSV in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/international-shipping-ship/YYYMM/GFW/global-climate-trace_dark-fleet-shipping-ship_GFW_MMDDYY.csv, where the folder name YYYYMM represents the delivery date of the data, and the suffix on the file name _MMDDYY.csv also represents the delivery date of the data.

3.0.2.1 Confidence and uncertainty

For the dark fleet emissions confidence estimates, we again follow the general recommendations proposed by Climate TRACE for assigning qualitative values for each numeric value (i.e., very high, high, medium, low, very low). We assign a value of very low to all numeric values given the technical challenges and difficulty of determining emissions for vessels that do not broadcast AIS. We save this CSV in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/international-shipping-ship/YYYMM/GFW/confidence-global-climate-trace_dark-fleet-shipping-ship_GFW_MMDDYY.csv, where the folder name YYYYMM represents the delivery date of the data, and the suffix on the file name _MMDDYY.csv also represents the delivery date of the data.

For the dark fleet emissions uncertainty estimates, there is not currently a standard Climate TRACE methodology for assigning these values since this is a brand new data format. Therefore, we adopt an analogous approach to what is currently done for the vessel-level emissions data. For each year and each pollutant, we calculate the standard deviation of emissions from across the twelve monthly estimates. This gives us an uncertainty estimate for each year of data and each pollutant. We save this CSV in Climate TRACE’s Google Cloud Storage bucket as climate_trace_internal/transportation/international-shipping-ship/YYYMM/GFW/uncertainty-global-climate-trace_dark-fleet-shipping-ship_GFW_MMDDYY.csv, where the folder name YYYYMM represents the delivery date of the data, and the suffix on the file name _MMDDYY.csv also represents the delivery date of the data.