1 AIS-based emissions model

1.1 Methods

1.1.1 Model Overview

Several alternative models were tested - each using an engineering bottom-up approach, based on AIS data, vessel characteristic information, and various emission conversion factors from the literature. Here we describe the best and most comprehensive model, which closely follows the methodology described in the 2020 IMO “Fourth Greenhouse Gas Study” (Faber and Xing (2020)) and the 2017 ICCT “Greenhouse Gas Emissions From Global Shipping” study (Olmer et al. (2017)), and how we apply it to the GFW data, and any deviations made from the published methedologies. The validation section below describes several alternative model specifications.

From a high-level, we calculate emissions as follows:

For each individual AIS message (ping), we calculate the main engine use, auxiliary engine use, and boiler use, each of which is a function of vessel characteristics, speed, and the time since the previous ping.
Using emissions factors (EFs) for the main, auxiliary, and boiler engines for seven pollutants (CO2, CH4, NOX, SOX, CO, N2O, and PM), we calculate the emissions of each pollutant for each individual AIS ping for both the main, auxiliary, and boiler engines.
For each pollutant and each AIS ping, we sum the emissions across the main, auxiliary, and boiler engines to get the ping-level emissions for each of these seven pollutants.
For three additional pollutants (PM2.5, PM10, and VOCs), we multiply the CO2 emissions by a conversion factor to get the ping-level emissions for each of these three pollutants.
With ping-level emissions, we are then able to aggregate emissions by vessel, by voyage, by port stay, by time, by space, etc.

1.1.1.1 Main engine

1.1.1.1.1 Main engine energy use

Based on Faber and Xing (2020) (page 64), main engine energy use (in killowatt-hours) is calculated as follows:

\[ \text{Main Engine Energy Use}_{kWh} = \text{Hours} \times \text{Load Factor} \times \text{Main Engine Power}_{kW} \] Where hours comes from each individual AIS message, main engine power (kW) comes from the vessel characteristics dataset, and the load factor comes from the product of several correction factors (CFs):

\[ \text{Load Factor} = \text{Speed-power CF} \times \text{Hull Fouling CF} \times \text{Weather CF} \times \text{Draft CF} \] Where: - hull fouling CF is 1.07, reflecting a 7% increase in resistance as described in Olmer et al. (2017) and Faber and Xing (2020) (see page 17 and Annexes page 270, respectively) - weather CF is a correction factor based on weather conditions, varying with the distance to shore. This factor is set at 1.1 for nearshore activity (≤5 nm from shore) to account for a 10% increase in resistance, and 1.15 for offshore activity (>5 nm), reflecting a 15% increase in resistance, as described in the Olmer et al. (2017) and Faber and Xing (2020) (see pages 18 and 270, respectively). - draft CF is extracted from the average draught by sector as reported in Olmer et al. (2017) (see Table 13 on page 20). Weights were applied by vessel type based on fuel consumption data from the Faber and Xing (2020) (see Annex 1, Figure 4), since fuel consumption values by type are proportionally related to emissions. This weighted average provided a final estimate of 0.85. Note: this factor could be refined using vessel class-specific averages. - speed-power CF is defined as \((\text{speed}_{knots} / {\text{design\_speed}_{knots}}) ^3\), with the additional stipulation that this ratio should not exceed 1:

\[ \text{Speed-power CF} = \begin{cases} 1 & \text{if } \frac{\text{speed}_{knots}}{\text{design\_speed}_{knots}} > 1, \\ \frac{\text{speed}_{knots}}{\text{design\_speed}_{knots}} & \text{otherwise} \end{cases} \]

In this equation, speed is derived from AIS-broadcasted speed measurements when there has been <= 1 hour since the previous AIS message (we therefore assume that all activity within the past hour is traveling at a similar speed). When it has been more than 1 hour since the previous message, the implied speed is used as a more accurate measurement of speed for that time period. This is calculated as distance from last position divided by hours since last position. Design speed for each vessels was estimated using a random forest regression trained using known registry design speed for a subset of vessels alongside other vessel characteristics including main engine power (\(ME\), kW) and gross tonnage (\(GT\)). This approach was applied to the entire GFW dataset of vessels.

\[ \text{design\_speed}_{knots} = (3.390 \times 10^{-4}) \cdot ME + (2.151 \times 10^{-5}) \cdot GT - (2.742 \times 10^{-9}) \cdot ME \cdot GT + 12.93 \]

As a last step, we ensure that the final load factor (the product of the above correction factors) does not exceed a value of 0.98, as recommended by Faber and Xing (2020) (see page 272).

\[ \text{Load Factor} = \begin{cases} 0.98 & \text{if (Load Factor)} > 0.98, \\ \text{Load Factor} & \text{otherwise} \end{cases} \] ###### Adjustments for fishing vessels

For fishing vessels, we adjust the main engine load factor based on the relationships published by Coello et al. (2015) (and later used by Sala et al. (2018)).

For fishing vessels of class trawlers and dredge_fishing, when they are actively fishing we assign a main engine load factor of 0.75. The intuition is that for these vessel types, even if they are moving slowly, their engines can be exerting tremdeous power while they are actually fishing with depolyed gear.
For all fishing vessels, we limit the main engine load factor so that it falls between 0.2 and 0.9

1.1.1.1.2 Main engine emissions

Main engine emissions for each pollutant (Table 1.1) is determined by multiplying each pollutant’s emissions factor (EF) by the main engine energy use. Main engine pollutant emission factors are derived from Appendix E in Olmer et al. (2017). For each pollutant, we use the average emissions factor for slow-speed, medium-speed, and high-speed diesel engines (SSD/MSD/HSD). SSD/MSD/HSD engines represent ~98% of vessels (Table 10 of Faber and Xing (2020)). The main engine emissions factors used for each pollutant are as follows:

Table 1.1: Main engine emissions factors, by pollutant

pollutant	main_ef_g_per_kwh
CH4	0.010
CO	0.540
CO2	629.833
N2O	0.030
NOX	12.960
PM	0.605
SOX	3.917

Additionally, we apply a low-load correction factor based on the IMO Fourth GHG Report Table 20. Engines operating at very low loads below 20% operate inefficiently, and emit more of certain pollutants. Table 20 provides low-load correction factors which vary based on the exact engine load and for each pollutant. For each AIS ping, we therefore multiply the main engine emissions by this correction factor based on the engine load and pollutant. Main engine loads >20% do not get any low-load correction factor applied.

1.1.1.2 Auxilliary engine

1.1.1.2.1 Auxilliary engine energy use

Our initial modeling approach simplified the Faber and Xing (2020) recommendations by capturing the differences in auxiliary engine power consumption based on the vessel’s status, distinguishing between stationary and at-sea conditions. Speeds below 0.5 knots were considered stationary.

\[ \text{Aux engine Energy Use}_{kWh} = \text{hours} \times \begin{cases} \text{aux\_0sp}_{kW} & \text{if } \text{speed}_{knots} \leq 0.5, \\ \text{aux\_atsea}_{kW} & \text{otherwise.} \end{cases} \] Here, auxiliary engine power terms (\(\text{aux\_0sp}_{kW}\) and \(\text{aux\_atsea}_{kW}\)) were simplified from the ICCT and IMO models (which uses 4 operational phases — maneuvering, anchor, and berth, see Table 17 of the Faber and Xing (2020)). We did so by averaging auxiliary engine and boiler power for cruising and maneuvering into atsea and averaging auxiliary and boiler power for anchor and berth into 0sp. The rationale behind was that most vessels spend most of their time cruising and at ~0 speed.

While such simplification helped streamline the modeling process, it operated under the assumption of equal auxiliary emissions across vessel types and distinct operational phases, not fully capturing vessel behavior in terms of emissions. Therefore, after obtaining good validation results in the initial phase of model testing, we refined this approach to more closely follow the Faber and Xing (2020).

Below, we describe the methods followed to include disaggregate auxiliary engine and boiler power demands under four operational phase

1.1.1.2.2 Auxilliary engine and boiler energy use (4 phases)

The model described in Faber and Xing (2020), assumes that while in service, a ship is operating in one of four defined phases: at berth, at anchor, maneuvering, or at sea.

For small vessels, we follow the recommendations from the 4th IMO study (page 68, Faber and Xing (2020)), where auxiliary engine power and boiler power are relative to main engine power. For larger vessels, aux_engine_power_kw and boiler_power_kw are defined based on vessel class and operational phase.

\[ \text{Aux engine Energy Use}_{kWh} = \text{hours} \times \begin{cases} 0 & \text{if } \text{main\_engine\_power\_kw} \leq 150 \\ 0.05 \times \text{main\_engine\_power\_kw} & \text{if } \text{main\_engine\_power\_kw} \leq 500 \\ \text{aux\_engine\_power\_kw} & \text{otherwise} \end{cases} \]

\[ \text{Boiler Energy Use}_{kWh} = \text{hours} \times \begin{cases} 0 & \text{if } \text{main\_engine\_power\_kw} \leq 150 \\ \text{boiler\_power\_kw} & \text{otherwise} \end{cases} \]

The inclusion of the four phases for larger vessels requires the use of Table 17 from the Faber and Xing (2020), including energy demand for the auxiliary engine and the boiler. However, this table expresses power demand based on vessel tonnage in different units. Since GFW has vessel size in GT, we needed to convert some of the values represented in DWT, TEU, and CBM to GT. Here, we present the approach followed to establish a direct size units relationship by vessel category.

1.1.1.2.2.1 DWT conversion

To establish the GT-DWT relationship, we used data containing both GT and DWT for each vessel. By assessing the relationship between these units, which mostly present linear relationships by vessel type, we defined a simple regression allowing us to derive conversion expressions with sufficient confidence from one unit to the other.

Such data was obtained through web scraping from open online sources, containing information for 464799 vessels on variables such as type, gt, dwt, length_m, beam_m, through which we could draw the size units relationship by vessel class.

Out of all vessel types, we only need to evaluate the tonnage relationship for a few vessel types, the ones included in Table 17 from Faber and Xing (2020).

$ship_type
[1] "Bulk carrier"         "Chemical tanker"      "General cargo"       
[4] "Oil tanker"           "Other liquids tanker" "Refrigerated bulk"   
[7] "Ro-Ro"

In order to properly establish the size relationship, we need to group the categories from our dataset so they match the categories from Table 17. By doing so, we can evaluate each category’s relationship.

For instance, for Bulk carriers we have 7 categories which, according to Figure 1.1, present a linear relationship.

Figure 1.1: GT vs DWT relationship for Bulk Carriers.

The same occurs for chemical tankers with 5 categories, as shown in Figure 1.2.

Figure 1.2: GT vs DWT relationship for Chemical Tankers.

For general cargo, we have 3 categories. One of them, Passenger/General Cargo Ship, as seen in Figure 1.3, deviates from the linearity and may fall within the Ferry-pax only category from Table 17, so we will discart it.

Figure 1.3: GT vs DWT relationship for General Cargo vessels.

Related to oil tankers, several vessel categories contain the label oil. However, most of them actually belong to chemical or bulk carriers. In this grouping, we will exclusively include crude oil tankers and bitumen Tankers.

Figure 1.4: GT vs DWT relationship for Oil Tankers.

For the remaining liquid carriers, we will assign “Water Tanker”, “Wine Tanker” and “Molasses Tanker” from our table to the same category (Figure 1.5).

Figure 1.5: GT vs DWT relationship for Other Liquids Tankers category.

For refrigerated bulk, we have 2 categories, as shown in Figure 1.6.

Figure 1.6: GT vs DWT relationship for Refrigerated Bulk.

Lastly, for Ro ships, we have 3 categories, following distinct relationships as shown in Figure 1.7. Only Ro-Ro Cargo ships are the ones we are interested in, as the other two fall within the Ferry-RoPax category from Table 17.

Figure 1.7: GT vs DWT relationship for Ro-Ro.

With the defined equivalences between groups from Faber and Xing (2020) and groups from our dataset described above, we will update the original dataframe to adjust the regressions. By fitting gross tonnage (GT) based on deadweight tonnage (DWT) and grouped type, we obtain the expressions explaining the relationship between both size units per vessel type, along with the performance metrics summarized in Table Table 1.2. We’ll save this expression in a lm object and used it later to update table 17.

Table 1.2: Model summary of gross tonnage (GT) as a function of deadweight tonnage (DWT) and ship type.

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.9935724	0.9935717	2496.954	1325496	0	7	-554797.7	1109613	1109694	374236342508	60024	60032

1.1.1.2.2.2 TEU conversion

TEU stands for Twenty-foot Equivalent Unit, which is a standard unit of measure used in the shipping industry to describe the capacity of container ships and terminals. One TEU represents the dimensions of a standard 20-foot long container.Therefore, TEU is used to quantify cargo capacity in terms of the number of 20-foot containers a vessel can carry. For example, a ship with a capacity of 10,000 TEU can carry 10,000 standard 20-foot containers.

For TEU, we will obtain GT equivalents based on the design formulas for the calculation of key design vessel characteristics from Abramowski, Cepowski, and Zvolenskỳ (2018) as detailed below:

\[ GT = -1097.4+11.049·TEU \]

1.1.1.2.2.3 CBM conversion

The size units of liquefied tankers represented as “CBM” refer to cubic meters (m³). This measurement indicates the volume capacity of the tankers, specifically how much liquefied gas (such as liquefied natural gas, LNG, or liquefied petroleum gas, LPG) they can carry.

For this unit conversion, we have not been able to find any large dataset to establish linear relationships, nor any publication defining expressions for unit conversion. The only available resource is the information from 23 vessels containing GT and CBM values, which allows to define a basic regression. This will establish the GT equivalence with intermediate to low confidence to update table 17 for gas tankers. This is a point for improvement, but for now, it will suffice.

Figure 1.8: GT vs CBM relationship for Gas Tankers.

After testing, we have seen how we can obtain a slightly better adjustment if we distinguish between LPG and LNG. However, Table 17 groups them together under the same category, so we will establish GT exclusively as a function of CBM (Table 1.3).

Table 1.3: Model summary of gross tonnage (GT) as a function of cubic meters (CBM).

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.982426	0.9815891	7253.543	1173.945	0	1	-236.0421	478.0841	481.4906	1104891522	21	23

1.1.1.2.2.4 Updating table 17

Once we have defined the relationship between GT and the other size units for each vessel class, we will update Table 17 from (Table 1.4). This will allow us to incorporate the auxiliary engine and boiler power outputs per ship class and operational mode into our AIS-based model.

Table 1.4: Sample of Units Transformation from Table 17 of the IMO Report.

ship_type	size_lower	size_upper	size_units	size_lower_gt	size_upper_gt
Bulk carrier	0	9999	dwt	0.000	7738.351
Bulk carrier	10000	34999	dwt	7738.864	20567.005
Bulk carrier	35000	59999	dwt	20567.518	33395.658
Bulk carrier	60000	99999	dwt	33396.171	53921.504
Bulk carrier	100000	199999	dwt	53922.017	105236.119
Bulk carrier	200000	NA	dwt	105236.632	NA

1.1.1.2.2.5 Updating ship types to GFW classes

Our model runs using GFW vessel information tables. With the updated Table 17 including size ranges in GT, we now need to update the ship types in that table and group them according to the vessel classes available in GFW datasets. To do this, a preliminary analysis of IHS vessel types found in the GFW classes was conducted. While some could be directly assigned based on GFW classification criteria, others, such as some cargo and tankers, had to undergo a more thorough process. By evaluating the relationship between size and auxiliary power for three IHS types, we generated a regression best each types. This helped us disaggregate vessel class composition in the GFW dataset and obtain the approximate percentage composition of IHS vessel types represented within GFW vessel groups Table 1.5.

Table 1.5: IHS ship types and their equivalent GFW vessel classes. Weights have been modified based on Mark Powell’s preliminary analysis.

gfw_vessel_class	tb17_ship_type	weights
tug	Service - tug	1.0000000
cargo	General cargo	0.3200000
cargo	Bulk carrier	0.4700000
cargo	Container	0.1100000
cargo	Ro-Ro	0.0500000
cargo	Vehicle	0.0500000
cargo.bulk_carrier	Bulk carrier	1.0000000
cargo.container	Container	1.0000000
cargo.general	General cargo	1.0000000
cargo.refrigerated	Refrigerated bulk	1.0000000
cargo.ro_ro	Ro-Ro	1.0000000
bunker	Chemical tanker	1.0000000
reefer	Refrigerated bulk	1.0000000
tanker	Oil tanker	0.4500000
tanker	Chemical tanker	0.3400000
tanker	Other liquids tanker	0.0500000
tanker	Liquefied gas tanker	0.1600000
tanker.chemical_oil	Oil tanker	0.5696203
tanker.chemical_oil	Chemical tanker	0.4303797
tanker.liquefied_gas	Liquefied gas tanker	1.0000000
tanker.other	Other liquids tanker	1.0000000
fishing	Miscellaneous - fishing	1.0000000
seiners	Miscellaneous - fishing	1.0000000
research	Service - other	0.5000000
research	Offshore	0.5000000
trawlers	Miscellaneous - fishing	1.0000000
trollers	Miscellaneous - fishing	1.0000000
passenger	Ferry-RoPax	0.3300000
passenger	Ferry-pax only	0.3300000
passenger	Cruise	0.3300000
well_boat	Miscellaneous - fishing	1.0000000
fixed_gear	Miscellaneous - fishing	1.0000000
dive_vessel	Miscellaneous - fishing	1.0000000
non_fishing	Miscellaneous - other	0.5000000
non_fishing	Yacht	0.5000000
fish_factory	Miscellaneous - other	1.0000000
other_seines	Miscellaneous - fishing	1.0000000
purse_seines	Miscellaneous - fishing	1.0000000
set_gillnets	Miscellaneous - fishing	1.0000000
squid_jigger	Miscellaneous - fishing	1.0000000
patrol_vessel	Service - other	0.5000000
patrol_vessel	Offshore	0.5000000
pole_and_line	Miscellaneous - fishing	1.0000000
set_longlines	Miscellaneous - fishing	1.0000000
supply_vessel	Service - other	0.5000000
supply_vessel	Offshore	0.5000000
dredge_fishing	Miscellaneous - fishing	1.0000000
pots_and_traps	Miscellaneous - fishing	1.0000000
seismic_vessel	Service - other	0.5000000
seismic_vessel	Offshore	0.5000000
cargo_or_reefer	General cargo	1.0000000
cargo_or_tanker	General cargo	1.0000000
bunker_or_tanker	Chemical tanker	1.0000000
container_reefer	Refrigerated bulk	1.0000000
other_not_fishing	Miscellaneous - other	1.0000000
tuna_purse_seines	Miscellaneous - fishing	1.0000000
dredge_non_fishing	Miscellaneous - fishing	1.0000000
drifting_longlines	Miscellaneous - fishing	1.0000000
other_purse_seines	Miscellaneous - fishing	1.0000000
specialized_reefer	Refrigerated bulk	1.0000000
fish_tender	Miscellaneous - fishing	1.0000000
driftnets	Miscellaneous - fishing	1.0000000
other_fishing	Miscellaneous - fishing	1.0000000

Using the resulting weights, we can more accurately combine IHS groups into GFW classes and obtain auxiliary and boiler power estimates for each operational phase, based on weighted means for each overlapping size range. The resulting table is stored in BQ under world-fishing-827.proj_ocean_ghg.aux_and_boil_power_by_operational_mode.

With it, we updated world-fishing-827.proj_ocean_ghg.vessel_info by including the corresponding boiler and auxiliary engine power demand considering vessel size and class for each of the four operational phases. This can then be fed into the model to provide more accurate emission estimates.

1.1.1.2.3 Auxilliary engine and boiler emissions

Auxiliary engine and boiler emissions (Table 1.6) are determined by multiplying each pollutants emissions factor by the auxiliary engine or boiler energy use. We calculate emissions for each pollutant using emissions factors derived from Appendix G and H from Olmer et al. (2017). For each pollutant, we use the average emissions factor for slow-speed, medium-speed, and high-speed diesel engines (SSD/MSD/HSD). SSD/MSD/HSD engines represent ~98% of vessels (Table 10 of Faber and Xing (2020)). The auxiliary engine and boiler emissions factors used for each pollutant are as follows:

Table 1.6: Auxilliary engine emissions factors, by pollutant

pollutant	aux_ef_g_per_kwh
CH4	0.010
CO	0.540
CO2	699.667
N2O	0.033
NOX	12.116
PM	0.610
SOX	4.337

\[ \text{Aux engine emissions}_{g} = \text{hours} \times \begin{cases} \text{aux\_at\_berth}_{kW} \\ \text{aux\_at\_anchor}_{kW} \\ \text{aux\_maneuvering}_{kW} \\ \text{aux\_at\_sea}_{kW} \end{cases} \times \text{Aux emissions factor}_{g/kWh} \]

Table 1.7: Boiler emissions factors, by pollutant

pollutant	boiler_ef_g_per_kwh
CH4	0.002
CO	0.200
CO2	958.000
N2O	0.043
NOX	2.033
PM	0.380
SOX	5.827

\[ \text{Boiler emissions}_{g} = \text{hours} \times \begin{cases} \text{boiler\_at\_berth}_{kW} \\ \text{boiler\_at\_anchor}_{kW} \\ \text{boiler\_maneuvering}_{kW} \\ \text{boiler\_at\_sea}_{kW} \end{cases} \times \text{Boiler emissions factor}_{g/kWh} \]

1.1.1.3 Total emissions

Finally, using the factors above, we estimate total emissions by multiplying Main Engine Energy Use, Aux Engine Energy Use and Boiler Energy Use by their respective emissions factors. We then sum the three values to get the total emission estimate.

\[ \text{Total Emissions}_{ CO_2, NO_X, ...} =\text{Main Engine Emissions}_{ CO_2, NO_X, ...} + \text{Aux Engine Emissions}_{ CO_2, NO_X, ...} + \text{Boiler Emissions}_{ CO_2, NO_X, ...} \]

1.1.1.4 Additional pollutants

For three additional pollutants (PM2.5, PM10, and VOCs), we multiply the CO2 emissions by the conversion factors in the following table. These conversion factors were provided by OceanMind, and thus our methodology is consistent wth their approach for these pollutants (which is detailed in Mayes et al. (2024)). To summarize, they are based off Table 27 from the 4th IMO report (Faber and Xing (2020)), which shows the emissions factors for each pollutant and fuel type (in terms of kg of pollutant per tonne of fuel consumed). The fuel types included are Heavy Fuel Oils (HFO), Liquefied Natural Gas (LNG), Marine Diesel Oil (MDO), and methenol. Using the 2018 factors, the emissions factors for PM2.5, PM10, and VOCs were each divided by the corresponding CO2 emissions factor, by fuel type, in order to convert between emissions of CO2 to emissions of these other pollutants, by fuel type. These emissions factor ratios were then used to calculate a a weighted average emissions factor for each pollutant, where the weighting was done by the average percentage of vessels that use each fuel type (which are taken as the 2018 values from Table 34 of the IMO report). This results in the following factors that we directly use:

gPM2.5/gCO2	gPM10/gCO2	gVOCS/gCO2
0.001598	0.001738	0.000933

1.1.2 Low sulfur fuel emissions correction factors for post-2020 data

Starting on January 1, 2020, the IMO required lower sulfur content fuel (0.5%, instead of the higher 2.5% that was typical for HDO prior to 2020). For data >= 2020-01-01, we therefore apply a new correction factor for SOX and PM to account for this lower sulfur content fuel.

For SOX: based on equation 15 (p. 74) of the 4th IMO report, the SOX emissions factor scales linearly with sulfur percentage content. For the pre-2020 SOX EF, we use Table E from the 2017 ICCT study, and take the average of the HFO value which is based on a 2.5% sulfur content (which is consistent the HFO row in Table 22 from the 4th IMO report). Assuming the sulfur content drops from 2.5% to 0.5% starting in 2020, this means an 80% drop in sulfur content, and an 80% drop in the sulfur EF. This 80% is consistent with several studies that looked the the impact this new requirement had on global sulfur emissions (@yuan2024abrupt, @yoshioka2024warming). So for the >= 2020-01-01 SOX EF, we can simply multiply our current < 2020-01-01 EF by 0.2. We can do this across the three main, aux, and boiler EFs.

For PM, PM2.5, and PM10: Equation 16 (p.74) of the 4th IMO report defines the PM10 EF relationship to sulfur content for HFO. This equation isn’t a perfect scalar factor like SOX, and depends on SFCi and has a constant. Based on Table 19 (p. 70) from the 4th IMO report, the average SFCi of HDO is 185 g/kWH (175 for SSD, 185 for MSD, and 195 for HSD). If we plug this into Equation 16, we get a < 2020-01-01 EF of 73.38 assuming a sulfur content of 2.5% (1.35+185*7*.02247*(2.5-.0246)) Assuming a post-2020 sulfur content of 0.5%, we get a >= 2020-01-01 EF of 15.18 (1.35+185*7*.02247*(0.5-.0246) ). The ratio of these two is 15.18/73.38 = 0.206, which is almost exactly the ratio for sulfur. So for the >= 2020-01-01 EFs for PM PM10 and PM2.5, we simply multiply our current < 2020-01-01 EF by 0.206. We can do this across the three main, aux, and boiler EFs of each pollutant.

1.1.3 Data

1.1.3.1 Individual AIS messages

For our individual AIS messages (pings) dataset, we leverage the latest-and-greatest version of the GFW’s AIS pipeline, Version 3. This is one of the GFW’s core internal datasets. This process automates the parsing, cleaning, augmenting, and publishing of raw AIS data (Kroodsma et al. (2018)). This table provides data from 2012 to present. Using this table as our starting point, we are able to estimate emissions from all analyzed pollutants for every single AIS message. These ping-level emissions can then later be aggregated however desired (e.g., by vessel, by voyage, by destination or arrival port, by time, by space, etc.)

Variables of interest within this table include the following:

ssvid: source specific vessel id; MMSI for AIS
hours: time since the previous position in the segment
speed_knots: speed (knots) from AIS message
implied_speed_knots: distance from last position divided by hours since last position
meters_to_prev: distance (meters) to the previous point in the segment
distance_from_shore_m: distance from shore (meters)
distance_from_port_m: distance from port (meters)
neural net score: The score is 1 if the neural net thinks this is a fishing position.
night_loitering: 1 if the seg_id of every message of a squid_jigger that is at night and not moving, 0 if not.

In order to minimize noisy data, we only include AIS messages that occur within valid segments (i.e., select seg_id from pipe_ais_v3_published.segs_activity where good_seg), and only also within daily segments that are not overlapping with each other (i.e., those that do not occur in overlap_segs_daily_v20241202).

1.1.3.2 Vessel characteristics

Vessel characteristics also represent another one of the core GFW datasets. These tables provides metadata for all vessels contained within GFW, organized by MMSI. The information for each vessel includes: 1) official registry information, when available (Park et al. (2023)); or 2) algorithm-derived vessel characteristics such as vessel class, engine power, and gross tonnage, when registry data are not available (Kroodsma et al. (2018)). The GFW vessel characteristics database leverages extensive work that has been done to scrape and aggregate many publicly available vessel registries (Park et al. (2023)). Note that we are currently using a cutting edge version of this database, which differs from Version 3 of the pipeline in that it uses a new experimental random forest algorithm for inferring certain vessel characteristics when they are not available in official vessel registries (vessel type, main engine power, length, gross tonnage, and max speed).

We are also leveraging two brand new cargo and tanker vessel type classification sub-models developed by GFW that build off the general vessel classification algorithm for low information vessels (i.e., those that don’t have known registry information). We can now differentiate many vessels that were previously lumped together as an undifferentiated cargo vessel type into the specific categories of bulk_carrier, container, general, refrigerated, and ro_ro. We can also now differentiate many vessels that were previously lumped together as an undifferentiated tanker vessel type into the specific categories of oil or chemical, liquefied gas, and other liquids. These classes now align with the IMO cargo vessel class types, allowing us to more accurately assign auxiliary engine power and boiler power for these vessel classes using the IMO methodology. These new models each leverage a random forest that is trained on information on port visit patterns vessels with known IMO cargo or tanker vessel class types. Sequences of port visits are converted into usable model features by implementing a Word2Vec model that transforms them into numeric arrays called embeddings. Additionally, each model also uses model features based on vessel activity including port hours, average distance from shore, and average speed. This allows us to more accurately classify cargo or tanker vessel types by looking at the most common IMO known vessel types that use the same ports. The cargo sub-model achieves am F1 weighted average score of 91%, while the tanker sub-model achieves am F1 weighted average score of 95%.

Variables of interest from pipe_ais_v3_published.vi_ssvid_v20250201 (the core GFW pipeline 3 vessel characteristics table) include the following:

ssvid: source specific vessel id; MMSI for AIS
best.flag: best flag state (ISO3) for the vessel
activity.active_hours: hours the vessel was broadcasting AIS and moving more than 0.1 knots. If desired, we can use this as a filter; vessels with < 24 hours of active hours have very limited data from which to calculate emissions from.
registry_info.registries_listed: vessel registries the vessel is listed on
registry_info.best_known_shipname: best known shipname for the vessel from registries
ais_identity.n_shipname_mostcommon.value: the most common normalized shipname broadcasted by this vessel
registry_info.best_known_callsign: best known callsign for vessel from registries
ais_identity.n_callsign_mostcommon.value: the most common normalized callsign broadcasted by this vessel
registry_info.best_known_imo imo_registry: best known IMO number for the vessel from registries
ais_identity.n_imo_mostcommon.value imo_ais: the most common normalized IMO number broadcasted by this vessel
offsetting: true if this vessel has been seen with an offset position at some point between 2012 and 2019 (this should be FALSE; if it is TRUE, it can be used as a filter to remove potentially erroneous/noisy vessels)
overlap_hours_multinames: the total numbers of hours of overlap between two segments where, over the time period of the two segments that overlap (including the non-overlapping time of the segments), the vessel broadcast two or more normalized name, where each normalized name was broadcast at least 10 or more times. That is a bit complicated, but the goal is to identify overlapping segments where there were likely more than one identity. (this should be 0; if it is > 0, it can be used as a filter to remove potentially erroneous/noisy vessels)

Variables of interest from proj_ocean_ghg.rf_predictions_v20250613 (the new experimental random forest vessel characteristics table developed for this project) include the following:

ssvid: source specific vessel id; MMSI for AIS
rf_best_vessel_class: best vessel class for the vessel (using official registry information where available; or the random forest vessel characteristics algorithm where registry information is not available. For cargo or tanker vessels where registry information is not available, we use the new cargo and tanker vessel specific classification sub-model.)
rf_best_engine_power_kw: best engine power (kilowatts) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)
rf_best_tonnage_gt: best tonnage (gross tons) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)
rf_best_length_m best length (meters) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)
rf_best_max_speed_knot best maximum speed (i.e., design speed) (knots) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)
on_fishing_list_rf_best: GFW determination of whether the vessel is a fishing vessel, using information from vessel registries and the random forest model

Variables of interest from proj_ocean_ghg.cargo_subtypes_v20250613 (the new experimental cargo and tanker sub-model) include the following:

ssvid: source specific vessel id; MMSI for AIS
vessel_subclass: Cargo or tanker sub-class, where available

There are currently 1,067,985 AIS-broadcasting vessels in the GFW dataset (this number excludes any AIS transponders which are labeled as fishing gear, helicopters, or submarines). Of these, we are able to estimate emissions 925,682 unique active vessels over our entire time period. Of these, 901,214 (97%) are ‘low-information’ vessels without an IMO number.

1.1.3.3 Voyages

Again leveraging Version 3 of the pipeline, GFW’s voyages table contains information for port-to-port voyages made by vessels. This table leverages extensive work done by the GFW team to: 1) define ports, 2) determine when vessels arrive at or depart from a port, and 3) determine voyages that are define by a port departure and a port arrival (Watch (2021)).

To define ports, specific anchorages are first identified by using the AIS data to find S2 cell locations where at least 20 unique vessels remained stationary at some point since 2012 (where ‘stationary’ is defined as moving less than 0.5km within a 12-hour period). Once these initial anchorage locations have been identified, anchorages within 4km of each other are grouped into ports. In this way, a single port may contain multiple anchorages within that port. Port names are then assigned to each of these locations spatially according to the following heirarchy:

World Port Index
GeoNames 1000 database that describes all settlements globally that have a population of at least 1,000 people
The top destination reported in AIS messages of stationary vessels that defined that anchorage
Contributed names and regional port databases

Once ports have been identified, heuristics are used to identify port entries and exits:

A vessel enters port when it comes within 3 kilometers of an anchorage point and exits port when it is outside 4 kilometers of the anchorage point. We use different threshold distances to avoid situations where a vessel continuously enters and exits port. This situation is still common, however, as vessels travel along coastlines and repeatedly come within close proximity to numerous anchorages. To distinguish actual port visits from coastal transits, we further identify when a vessel appears to stop at a given port. The vessel is considered to have “stopped” at port if its speed drops below 0.2 knots, and this port stop ends when the speed rises above 0.5 knots. AIS is often switched off when a vessel enters port, and it is turned back on when it leaves. As a result, we track port “gaps,” where a vessel that has entered port does not broadcast on AIS for at least four hours. Port stops and port gaps are behaviors indicating that a vessel visited a port for a specific reason and/or engaged in some activity while at the port, such as landing catch or exchanging supplies and crew. We can then allocate the at-sea activity of vessels to individual voyages between port visits.

Once these port entries and exits have been identified, voyages can simply be defined as all activity between those port events. In this way, individual AIS message (and its associated emissions) can be assigned to a voyage.

Variables of interest within this table include the following:

ssvid: source specific vessel id; MMSI for AIS
trip_id: A unique identifier for the trip generated by the ssvid, vesssel-id and the exit time of starting visit
trip_start: The initial timestamp of the voyage, when the vessel leaves port
trip_end: The final timestamp of the voyage, when the vessel reaches port
trip_start_anchorage_id: The id of the anchorage where the voyage starts
trip_end_anchorage_id: The id of the anchorage where the voyage ends

Further information of the starting and ending anchorages can be obtained by joining this table to GFW’s anchorage table, which includes the following:

anchorage_id: The id of the anchorage
label: Port name
iso3: Port ISO3 code
lat: latitude of the anchorage
lon: longitude of the anchorage

In summary, from 2015-01-01 to 2025-06-30, we have 145,747,116 unique voyages across 805,021 unique vessels. These trips visited 14,682 unique ports across 209 unique countries.

1.1.3.4 Port visits

Again leveraging Version 3 of the pipeline, we use GFW’s port visits table. Port visits are determinede using the same methods as describe above for assigning voyages. Variables of interest within this table include the following:

ssvid: source specific vessel id; MMSI for AIS
visit_id: Unique ID for this visit
start_timestamp: timestamp at which vessel crossed into the anchorage
end_timestamp: timestamp at which vessel crossed out the anchorage
start_anchorage_id: anchorage_id of anchorage where vessel entered port
end_anchorage_id: anchorage_id of anchorage where vessel exited port
confidence: How confident are we that this is a real visit based on components of the visits: 1 -> no stop or gap; only an entry and/or exit 2 -> only stop and/or gap; no entry or exit 3 -> port entry or exit with stop and/or gap 4 -> port entry and exit with stop and/or gap

For quality control purposes, we filter this dataset to just those port visits with the highest confidence level, 4. We also only filter to those port visits where the starting and ending port label are the same.

As with the voyages dataset, for information of the port starting and ending anchorages can be obtained by joining this table to GFW’s anchorage table. Variables of interest within this table include the following:

anchorage_id: The id of the anchorage
label: Port name
iso3: Port ISO3 code
lat: latitude of the anchorage
lon: longitude of the anchorage

Note again that since a single port can have multiple anchorages, it is possible that a single port visit has different starting and ending anchorages, and therefore lat/long locations.

In summary, from 2015-01-01 to 2025-06-30, we have 109,617,280 unique port visits across 835,196 unique vessels. These port visits trips occurred in 14,431 unique ports across 209 unique countries.

1.2 Results

In this section, we provide some high-level results from our emissions model.

1.2.1 Time series trends

1.2.1.1 Number of vessels

First, we look at total global number of active vessels per year from 2015-2024 (Figure 1.9, Table 1.8).

Figure 1.9: Summary of total global number of active AIS-broadcasting vessels per year (2015-2024)

Table 1.8: Summary of total global number of active AIS-broadcasting vessels per year (2015-2024)

vessel_class	2015	2016	2017	2018	2019	2020	2021	2022	2023	2024
passenger	73,606	88,362	105,234	123,572	140,766	148,975	173,947	194,797	212,636	234,353
cargo.general	52,021	58,273	64,290	66,685	73,370	76,995	73,831	71,097	72,187	76,303
trawlers	39,025	44,761	49,719	54,491	56,954	58,413	60,890	62,518	66,732	66,157
tug	23,694	26,156	29,186	30,835	33,958	36,355	37,889	38,554	41,398	44,972
other_fishing	14,321	16,525	18,358	20,547	21,997	22,827	25,345	28,047	32,005	32,359
tanker.chemical_oil	16,291	16,650	17,699	20,205	21,302	19,726	20,335	21,430	21,341	23,950
cargo.bulk_carrier	13,832	14,571	15,410	15,514	16,695	16,822	16,778	17,248	17,628	18,750
drifting_longlines	2,343	3,008	3,693	4,240	4,795	5,543	7,143	8,731	9,328	9,685
bunker	3,804	4,100	4,269	4,704	5,221	5,735	5,467	5,486	7,696	8,400
cargo.container	4,981	4,946	5,279	5,474	5,726	5,829	6,162	6,650	6,907	7,462
set_gillnets	3,116	3,785	4,413	4,998	5,408	5,672	6,111	6,108	6,676	6,801
patrol_vessel	3,266	3,582	4,064	4,397	4,597	4,917	5,234	5,707	6,366	6,671
cargo.ro_ro	3,080	3,220	3,584	3,885	4,376	4,620	4,867	5,290	5,499	5,989
pots_and_traps	3,025	3,655	4,317	4,908	5,214	5,543	5,815	5,858	5,932	5,916
supply_vessel	3,822	3,661	3,534	3,567	3,738	3,766	3,819	4,089	4,303	4,438
dredge_non_fishing	2,054	2,263	2,564	2,640	2,723	2,860	3,007	3,070	3,288	3,411
squid_jigger	712	812	915	1,049	1,216	2,051	2,326	2,486	2,631	2,677
tanker.liquefied_gas	1,109	1,206	1,537	1,702	2,067	1,639	1,762	1,963	2,163	2,636
set_longlines	1,238	1,355	1,488	1,574	1,691	1,809	1,953	2,012	2,014	2,033
other_not_fishing	981	995	1,033	1,075	1,125	1,173	1,241	1,317	1,411	1,493
seismic_vessel	1,142	1,139	1,131	1,159	1,148	1,153	1,177	1,208	1,269	1,284
specialized_reefer	743	786	822	850	841	819	875	904	945	939
tuna_purse_seines	484	571	608	639	699	728	746	761	756	770
dredge_fishing	342	378	386	403	428	462	479	492	521	570
pole_and_line	367	402	428	458	488	531	536	541	549	547
dive_vessel	403	439	429	455	464	450	447	466	473	461
research	197	226	238	253	281	294	307	302	292	289
trollers	149	180	209	241	263	268	275	283	276	278
well_boat	132	151	171	172	179	191	196	196	199	197
other_seines	155	166	172	178	186	187	191	192	193	192
reefer	100	101	99	104	101	98	110	84	104	101
container_reefer	130	126	128	122	113	109	106	104	100	99
tanker.other	53	58	55	57	78	64	48	56	78	96
cargo.refrigerated	61	66	74	78	77	77	66	56	58	65
other_purse_seines	39	38	39	38	40	40	39	39	39	41
fish_factory	22	25	25	24	26	26	21	25	22	25
bunker_or_tanker	15	15	10	6	11	8	9	9	10	12
driftnets	3	3	1	4	4	4	4	4	3	3

1.2.1.2 Voyages

Next, we summarize the number of voyages, globally, for vessels included in our analysis from 2015-2024 (Figure 1.10, Table 1.9).

Figure 1.10: Summary of total global annual number of voyages from 2015-2024 for vessels included in our analysis

Table 1.9: Summary of total global annual number voyages from 2015-2024 for vessels included in our analysis

year	n_unique_events
2015	8,437,555
2016	9,663,540
2017	11,593,371
2018	12,391,954
2019	13,101,881
2020	12,244,422
2021	13,556,057
2022	14,544,323
2023	15,269,286
2024	16,711,258

1.2.1.3 Port visits

Here we summarize the time series trend of the number of port visits, globally, for the vessels included in our analysis from 2015-2024 (Figure 1.11, Table 1.10).

Figure 1.11: Summary of total global annual number of port visits from 2015-2024 for the vessels included in our analysis

Table 1.10: Summary of total global annual number of port visits from 2015-2024 included in our analysis

year	n_unique_events
2015	6,177,728
2016	7,161,050
2017	8,720,020
2018	9,271,336
2019	9,849,398
2020	9,304,535
2021	10,301,900
2022	11,176,546
2023	11,810,193
2024	13,021,187

1.2.1.4 Emissions

Next, we look at total annual global emissions (metric tonnes, MT) for each pollutant from 2015-2024 (Figure 1.12, Table 1.11).

Figure 1.12: Summary of total global annual emissions (MT) from 2015-2024, by pollutant.

Table 1.11: Summary of total global annual emissions (MT) from 2015-2024, by pollutant.

year	CO2	NOX	SOX	CH4	CO	N2O	PM	PM10	PM2_5	VOCS
2015	718,722,836	12,625,854	4,449,919	11,869	591,766	35,229	653,820	1,330,165	1,223,017	809,033
2016	780,889,103	13,573,763	4,833,909	12,720	635,914	38,215	704,985	1,442,282	1,326,103	874,026
2017	846,501,499	14,529,395	5,238,889	13,572	680,694	41,352	757,432	1,560,024	1,434,361	941,607
2018	876,809,809	14,897,047	5,425,214	14,011	701,399	42,854	780,849	1,618,553	1,488,175	979,887
2019	899,209,924	15,139,576	5,562,484	14,423	718,367	44,028	798,821	1,666,298	1,532,073	1,015,842
2020	889,302,629	14,816,298	1,099,983	14,223	706,586	43,569	161,981	340,080	312,686	1,009,692
2021	965,566,238	15,989,880	1,194,266	15,130	757,245	47,143	174,550	367,212	337,632	1,079,482
2022	1,038,855,368	17,138,169	1,284,752	16,388	817,252	50,812	187,859	396,442	364,507	1,172,586
2023	1,085,212,897	18,017,603	1,342,122	17,532	866,996	53,299	197,934	416,908	383,325	1,247,819
2024	1,167,605,666	19,156,424	1,443,761	18,503	920,275	57,199	210,961	446,935	410,933	1,329,062

1.2.2 Emissions by vessel class

Next, for 2024, we summarize the global annual emissions by vessel class (Figure 1.13). These are summarized using the GFW vessel class categories. We first plot simply CO2 emissions, then look at a table of all pollutants.

Figure 1.13: Summary of global 2024 CO2 emissions by vessel class

We next can look at a table of all pollutants for 2024 (Table 1.12):

Table 1.12: Summary of global 2024 emissions by vessel class and pollutant

vessel_class	CO2	NOX	SOX	VOCS	CO	PM10	PM2_5	PM	N2O	CH4
tanker.chemical_oil	240,047,644	3,377,729	296,161	242,988	158,184	88,232	81,125	38,552	11,415	2,948
passenger	228,838,747	2,036,916	280,603	230,835	109,218	84,011	77,243	29,319	10,730	1,933
cargo.container	196,436,153	4,137,712	243,953	272,227	201,667	81,037	74,509	43,233	10,161	4,469
cargo.bulk_carrier	169,353,456	3,366,512	210,302	186,232	151,505	64,038	58,880	33,987	8,308	3,015
cargo.general	73,742,032	1,356,534	91,364	91,964	65,538	29,184	26,833	14,610	3,707	1,404
tanker.liquefied_gas	73,270,269	1,388,193	90,924	74,948	61,229	27,026	24,849	14,055	3,539	1,180
trawlers	48,463,405	989,333	60,218	54,239	47,185	18,466	16,979	9,954	2,394	901
cargo.ro_ro	42,925,936	807,143	53,251	45,564	36,209	16,032	14,741	8,268	2,088	712
tug	18,769,929	417,420	23,297	34,802	22,384	8,793	8,085	4,649	1,053	574
bunker	15,275,039	85,166	18,658	14,743	5,262	5,528	5,083	1,606	703	80
other_fishing	12,951,008	272,055	16,082	18,464	15,381	5,425	4,988	2,883	678	304
supply_vessel	7,578,618	156,834	9,405	11,480	7,971	3,243	2,981	1,699	402	187
specialized_reefer	7,313,700	119,952	9,052	7,314	5,496	2,678	2,462	1,286	351	104
dredge_non_fishing	4,124,723	82,553	5,120	5,179	3,908	1,636	1,505	862	209	84
patrol_vessel	3,827,480	75,893	4,748	5,274	3,750	1,575	1,448	816	198	85
drifting_longlines	3,820,624	79,854	4,745	5,200	4,367	1,570	1,443	837	198	86
container_reefer	3,698,132	70,144	4,591	3,632	3,066	1,346	1,237	705	178	58
seismic_vessel	2,776,295	60,538	3,447	4,644	3,126	1,240	1,140	659	151	77
other_not_fishing	2,580,742	35,281	3,178	3,384	1,900	1,041	957	441	130	42
squid_jigger	2,382,356	50,498	2,959	3,363	2,820	994	914	531	124	56
tuna_purse_seines	1,817,801	39,707	2,259	2,576	2,189	760	698	412	95	43
cargo.refrigerated	1,467,227	26,757	1,820	1,413	1,175	531	488	272	70	22
set_gillnets	1,145,545	24,347	1,423	1,671	1,393	485	446	258	60	28
dive_vessel	931,186	20,111	1,156	1,505	1,039	410	377	218	50	25
set_longlines	917,084	19,652	1,139	1,331	1,115	387	356	207	48	22
pots_and_traps	850,512	18,611	1,057	1,308	1,091	368	338	198	45	22
pole_and_line	444,109	9,336	551	630	526	186	171	99	23	10
well_boat	411,784	8,011	511	434	358	153	141	81	20	7
research	306,239	5,905	380	364	278	119	109	62	15	6
reefer	289,423	3,601	357	284	175	105	97	43	14	3
tanker.other	286,701	2,681	352	293	143	106	97	38	13	3
dredge_fishing	249,167	5,146	309	319	270	100	92	53	13	5
other_seines	142,725	3,041	177	208	174	60	55	32	8	3
other_purse_seines	55,792	1,219	69	82	69	24	22	13	3	1
fish_factory	49,560	860	61	73	46	21	19	10	3	1
trollers	48,789	1,090	61	79	66	22	20	12	3	1
bunker_or_tanker	14,965	74	18	14	5	5	5	2	1	0
driftnets	769	16	1	1	1	0	0	0	0	0

1.2.3 Spatial maps of emissions

Next we can look at spatial maps of emissions by pollutant, aggregated across all vessel types. These maps are shown at a spatial resolution of 0.1x0.1 degrees (Figure 1.14 - Figure 1.20).

Reading layer `World_Countries_Generalized' from data source 
  `/Users/gmcdonald/github/ocean-ghg/data/raw/World_Countries_Generalized_Shapefile/World_Countries_Generalized.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 251 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -20037510 ymin: -30240970 xmax: 20037510 ymax: 18418390
Projected CRS: WGS 84 / Pseudo-Mercator

Figure 1.14: Map of 2024 CO2 emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.15: Map of 2024 NOX emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.16: Map of 2024 SOX emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.17: Map of 2024 CH4 emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.18: Map of 2024 CO emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.19: Map of 2024 N2O emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.20: Map of 2024 PM emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.21: Map of 2024 PM2.5 emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.22: Map of 2024 PM10 emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

Figure 1.23: Map of 2024 VOCS emissions, aggregated across all vessel classes, at 0.1x0.1 degree spatial resolution.

1.3 Model Validation

The initial stages of model development involved multiple rounds of preliminary validation against testing model versions to identify the best-performing one among those evaluated. Following this testing phase, and with the AIS-based model built as described above, we validated it against measured absolute emission values from a dataset of a European Union (EU) monitoring program.

1.3.1 EU emissions data

We used the \(CO_2\) emissions data from maritime transport provided by the European Maritime Safety Agency, which is part of the monitoring, reporting, and verification program of carbon emissions from maritime transport, set by the Regulation (EU) 2015/757. As detailed below, vessels with certain characteristics and certain trips made by such vessels operating in EEA sea ports, must report their emissions on an annual basis.

1.3.1.1 What’s in an out of EU 2015/757

Here, we summarize the relevant provisions of this legislation that define the vessels and trip characteristics that need to be filtered from our data for validation purposes.

Vessel characteristics:

Ships above 5000 gross tonnage.
Excludes warships, naval auxiliaries, fish-catching or fish-processing ships, wooden ships of a primitive build, ships not propelled by mechanical means, or government ships used for non-commercial purposes.

Trips characteristics:

Trips from their last port of call to a port of call under the jurisdiction of a Member State and from a port of call under the jurisdiction of a Member State to their next port of call, as well as within ports of call under the jurisdiction of a Member State.
Vessel activities covered by the law are those when the ships are at sea as well as at berth.
Ship at berth includes any ship which is securely moored or anchored in a port falling under the jurisdiction of a Member State while it is loading, unloading or hotelling, including the time spent when not engaged in cargo operations.
Port of call means the port where a ship stops to load or unload cargo or to embark or disembark passengers; consequently, stops for the sole purposes of refueling, obtaining supplies, relieving the crew, going into dry-dock or making repairs to the ship and/or its equipment, stops in port because the ship is in need of assistance or in distress, ship-to-ship transfers carried out outside ports, and stops for the sole purpose of taking shelter from adverse weather or rendered necessary by search and rescue activities are excluded.
Ship to ship transfers carried out outside ports are covered by the Regulation when these transfers take place as part of a voyage starting and/or ending with a port of call under the jurisdiction of a Member State.
A ship to ship transfer carried out outside ports is not considered as a port of call. As a consequence, if for example, a vessel leaves an EEA port, arrives to a US harbor performs a ship to ship operation outside the port limits, and then goes to South Corea the emissions falling within MRV scope will be the emissions released during the whole voyage from the EEA port of call until the port of call in South Korea. However, if the ship to ship transfer can be carried out within the port limits, that operation would constitute a port of call. The voyage covered by the MRV Maritime Regulation would then be an EEA port of call to a US port of call.
If a ship performs more than 300 voyages during the reporting period and all of these voyages either start from or end in a port within a Member State, the company can be exempted from monitoring the detailed parameters for each voyage.

Time ranges:

The reporting period covers one calendar year during which CO2 emissions have to be monitored. For voyages starting and ending in two different calendar years, the monitoring and reporting data shall be accounted under the first calendar year concerned.

1.3.2 Data filtering

Given the law’s exceptions, in order to validate emission estimates, we need to filter our results to match the EU data contents and aggregation. In this regard, the vessel characteristics in terms of type and gross tonnage can be filtered by simply assessing which vessel IDs are included in the EU dataset. As for the time range, since the data is aggregated yearly by the starting date of a trip, the filtering is straightforward.

The most challenging aspect of this data filtering corresponds to the trip characteristics, specifically when defining the “ports of call.” The main obstacle is the inability to determine the type of activity a vessel is engaged in while in port, which prevents us from identifying which trips are included within those aggregated yearly emissions. To work around this, and establish which vessels’ emissions can be included in the validation, we have compared the total distance and total hours at sea reported in the EU data to the total distance traveled across all trips in our trip-level emissions for that vessel and year. If these numbers were within ±5%, ±10%, or ±15% difference, then we could reasonably assumed that aggregated trips in the GFW dataset correspond to trips included in the EU dataset.

The EU data has been compiled and stored in proj_ocean_ghg.eu_validation_data, while the selection and part of the filtering from our data has been conducted through the queries in eu_validation_trip.sql and eu_validation_port.sql, differentiating between emissions by trip and port visits, and stored in proj_ocean_ghg.eu_validation_trip and proj_ocean_ghg.eu_validation_port respectively.

1.3.3 Validation results

Emissions estimates are divided between emissions at sea and emissions at port. Here, we present the differences between our results and the EU emissions data.

1.3.3.1 Trip emissions

As detailed in the Methods section, we have defined trip emissions as those occurring between ports. In the EU dataset, these emissions are categorized under three variables: emissions from EEA-EEA, EEA-NonEEA, and NonEEA-EEA seaports. We have run our model, selected the data by trips involving at least one EEA port, aggregated the results by year, and selected vessels listed in the EU dataset for a specific year.

After discarding 1077 observations with no time at sea, and assessing potential duplicates due to different ssvid for the same imo_number (0 duplicates in this dataset), we ended up with a total of 217103 observations—year and vessel—selected from our emission estimates. This is 87.7% of the EU data observations. By applying different margin values to the annual hours at sea we get the following performance metrics detailed in Table 1.13. As we can see, higher performance is achieved applying a 10% margin, with which we obtained a selection of 1599 observations for validation.

Table 1.13: Performance metrics for Model 6.3 validated against the EU dataset using the hours at sea selection method and different margin values.

mae	rmse	nrmse	rsq	rsq_trad	mape	mpe	threshold	n_observations
2231.621	5068.543	0.456	0.801	0.792	29.600	-3.216	0.05	926
2353.508	5099.621	0.454	0.798	0.794	29.948	-3.458	0.10	1599
2410.567	5028.879	0.441	0.807	0.805	Inf	-Inf	0.15	2277
2579.295	5612.569	0.501	0.766	0.749	Inf	-Inf	0.20	2919
2735.795	5854.677	0.515	0.760	0.734	Inf	-Inf	0.25	3621
3103.941	7982.631	0.690	0.655	0.523	Inf	-Inf	0.30	4396
3386.662	8373.700	0.724	0.670	0.476	Inf	-Inf	0.35	5289
3862.253	9200.055	0.775	0.681	0.399	Inf	-Inf	0.40	6341

Comparing against the EU dataset with that 10% margin, the model demonstrates a good fit with an R-squared value of 0.8. While the MAE and RMSE indicate a considerable average magnitude of prediction errors, we need to consider that the average emission values are quite large since we are estimating absolute emissions per year. In fact, the normalized RMSE suggests that the errors, relative to the range of observed values, are moderate. Further, the model exhibits a tendency to underestimate emissions, as evidenced by the MPE. Despite some limitations, this validation framework offers a useful alternative to the one derived from the ML results, avoiding the related outlier inconveniences from emissions expressed in distance units. We can visually explore the relationship between our results and the EU data, observing certain linearity between both emission values with some spreading as the emission values increase (Figure 1.24).

Figure 1.24: Relationship between EU data and our emission estimates using hours at sea and a 10% margin.

We can double-check these results by applying the data filtering margin on distance traveled instead of hours. In this regard, while the EU dataset does not contain information on the total nautical miles (nm) navigated, we can extract this from the annual average \(CO_2\) emissions per distance. If we apply distinct margins to the total nautical miles (nm) navigated, we achieve the highest performance values for the 25% margin, with which we select 3551 observations. Overall, the performance metrics are slightly worse than those obtained by using hours at sea, which was a direct measure available in the EU dataset, rather than an indirect estimate like total nm navigated (Figure 1.25 and Table 1.14).

Table 1.14: Performance metrics for Model 6.3 validated against the EU dataset using total nm selection method and different margin values.

mae	rmse	nrmse	rsq	rsq_trad	mape	mpe	threshold	n_observations
1788.215	3866.017	0.360	0.872	0.870	23.392	1.76	0.05	1090
1811.906	3707.641	0.347	0.881	0.879	Inf	-Inf	0.10	1738
1933.623	3964.672	0.346	0.881	0.880	Inf	-Inf	0.15	2346
2111.669	7541.384	0.660	0.654	0.564	Inf	-Inf	0.20	2934
2260.113	7599.934	0.656	0.675	0.570	Inf	-Inf	0.25	3551
2435.037	7424.169	0.645	0.696	0.584	Inf	-Inf	0.30	4286
2701.152	7510.823	0.655	0.707	0.571	Inf	-Inf	0.35	5124
3087.718	7947.216	0.677	0.724	0.542	Inf	-Inf	0.40	6130

Figure 1.25: Relationship between EU data and our emission estimates using navigated nm and a 25% margin.

1.3.3.2 Port emissions

As for emissions at ports, these consist of emissions during port stays. Here, we have filtered those emission results by port visits within the EEA and followed the same aggregation procedure as for trips. However, some assumptions have been made due to the impossibility of filtering out stays that may not be included in the EU dataset. As mentioned earlier, one of the inconveniences related to using this dataset is the definition of “ports of call”, which establishes whether a vessel trip and port visit is considered or not under the regulation and, by extension, whether its emissions are reported and available in the EU dataset. Unable to define the activities performed by a vessel in a port, we assume that we will be overestimating the emissions at port since the EU does not include them all. With it, and applying the same procedure, we see that the performance values are quite poor given that overestimation, leaving us with the need to find an alternative way to validate our port emission results while improving our models estimates (Table 1.15 and Figure 1.26).

Table 1.15: Performance metrics for Model 6.3 validated against the EU dataset port emissions.

model	mae	rmse	nrmse	rsq	rsq_trad	mape	mpe
Model 6.3	7369.646	17146.17	12.431	0.137	-153.529	Inf	-Inf

Figure 1.26: Relationship between EU data and our port emission estimates.

By applying an additional filtering step, selecting those observations from the trips under the 10% hours at sea margin approach, we would expect to narrow down our observations to those actually happening under “ports of call”. However, as the performance values in Table 1.16 and the emissions relationship shown in Figure 1.27 indicate, our model still does not capture emissions at port as reflected in the EU dataset, highlighting the need for improvement in this aspect.

Table 1.16: Performance metrics for Model 6.3 validated against the EU dataset port emissions for a subset of observations.

model	mae	rmse	nrmse	rsq	rsq_trad	mape	mpe
Model 6.3	1382.316	6236.424	4.196	0.079	-16.616	Inf	-Inf

Figure 1.27: Relationship between EU data and a subset of our port emission estimates.

1.3.3.2.1 2 vs 4 operational phases

As described in the methods section, we tested the inclusion of 2 operational phases and 4 operational phases against the entire GFW dataset. The inconvenience of the first approach was the oversimplification of such phases—with different energy demands—when vessel movement is minimal, presumably failing to quantify emissions accurately, especially when vessels are at port. Additionally, the inclusion of all 4 operational phases incorporated the emissions from boilers, potentially reducing the underestimation of emissions from the 2 operational phases model.

The validation results for both approaches showed a slightly better performance for trip emissions (0.02 superior in R2) for the 2 operational phases model, although the 4 operational phases model underestimated the actual emissions less. As for port emissions, as expected, they considerably improved under the 4 operational phases approach, despite still presenting low performance. Therefore, while the inclusion of 4 operational phases allows us to closely follow the ICCT/IMO methodology, the validation results also show a clear advantage of implementing such refinement, with a slight drawback in trip emissions performance. However, we must consider that the inclusion of these additional phases comes with several assumptions in vessel class grouping, giving room for additional improvements. Better vessel class resolution in the GFW datasets could potentially translate to improved performance. Furthermore, alternative validation datasets, yet to be defined, could help establish clearer confidence intervals.

1.3.4 Comparison to other global emissions estimates

Several other emissions estimates have been done by the International Maritime Organization (IMO) (most recently for 2018), Emissions Database for Global Atmospheric Research (EDGAR) in (most recently for 2023), Organization for Economic Co-operation and Development (OECD) (most recently for 2024 using their experimental database). Below we compare our GHG emissions estimates with the findings of these studies to validate our results

The emissions inventories used in this comparison rely on a range of methods and have some differences in sector definitions and underlying data sources. Here we compare the most recent year of data from each inventory.

Highlights include:

GFW’s total estimated emissions (1.35 billion MT CO₂ for both broadcasting vessels and non-broadcasting vessels, and 1.17 billion MT CO₂ for just AIS-broadcasting vessels) are similar to OECD, EDGAR, and IMO, indicating high reliability and utility of our methodology and results.
IMO’s, OECD’s, and EDGAR’s emissions estimates do not include “dark” fleets, making GFW’s dark fleet estimates the first of their kind.

Comparison of our global annual emissions estimates with other inventories (the most recent year available for each inventory is shwon).
CO2 emissions (billion MT)	Data source
1.35	GFW (2024, AIS + S1)
1.17	GFW (2024, AIS)
0.97	OECD (2024)
0.91	EDGAR (2023)
1.06	IMO (2018)

Abramowski, Tomasz, Tomasz Cepowski, and Peter Zvolenskỳ. 2018. “Determination of Regression Formulas for Key Design Characteristics of Container Ships at Preliminary Design Stage.” New Trends in Production Engineering 1 (1): 247–57.

Coello, Jonathan, Ian Williams, Dominic A Hudson, and Simon Kemp. 2015. “An AIS-Based Approach to Calculate Atmospheric Emissions from the UK Fishing Fleet.” Atmospheric Environment 114: 1–7.

Faber, Shuang Zhang asper, Shinichi Hanayama, and Hui Xing. 2020. “Fourth IMO GHG Study 2020.” International Maritime Organization, full report.

Kroodsma, David A, Juan Mayorga, Timothy Hochberg, Nathan A Miller, Kristina Boerder, Francesco Ferretti, Alex Wilson, et al. 2018. “Tracking the Global Footprint of Fisheries.” Science 359 (6378): 904–8.

Mayes, Brett, Mark Powell, Dan Knights, Max Schofield, and Ted Mackereth. 2024. “Transportation Sector: Domestic and International Shipping Emissions.” Climate TRACE, 11–12. https://github.com/climatetracecoalition/methodology-documents/blob/main/2024/Transportation/Transportation%20sector-Domestic%20and%20International%20Shipping%20Emissions.docx.pdf.

Olmer, Naya, Bryan Comer, Biswajoy Roy, Xiaoli Mao, and Dan Rutherford. 2017. “Greenhouse Gas Emissions from Global Shipping, 2013–2015 Detailed Methodology.” International Council on Clean Transportation: Washington, DC, USA, 1–38.

Park, Jaeyoon, Jennifer Van Osdel, Joanna Turner, Courtney M Farthing, Nathan A Miller, Hannah L Linder, Guillermo Ortuño Crespo, Gabrielle Carmine, and David A Kroodsma. 2023. “Tracking Elusive and Shifting Identities of the Global Fishing Fleet.” Science Advances 9 (3): eabp8200.

Sala, Enric, Juan Mayorga, Christopher Costello, David Kroodsma, Maria LD Palomares, Daniel Pauly, U Rashid Sumaila, and Dirk Zeller. 2018. “The Economics of Fishing the High Seas.” Science Advances 4 (6): eaat2504.

Watch, Global Fishing. 2021. “Anchorages, Ports and Voyages Data.” Global Fishing Watch. Global Fishing Watch. https://globalfishingwatch.org/datasets-and-code-anchorages/.