| pollutant | main_ef_g_per_kwh |
|---|---|
| CH4 | 0.010 |
| CO | 0.540 |
| CO2 | 629.833 |
| N2O | 0.030 |
| NOX | 12.960 |
| PM | 0.605 |
| SOX | 3.917 |
1 AIS-based emissions model
1.1 Methods
1.1.1 Model Overview
Several alternative models were tested - each using an engineering bottom-up approach, based on AIS data, vessel characteristic information, and various emission conversion factors from the literature. Here we describe the best and most comprehensive model, which closely follows the methodology described in the 2020 IMO “Fourth Greenhouse Gas Study” (Faber and Xing (2020)) and the 2017 ICCT “Greenhouse Gas Emissions From Global Shipping” study (Olmer et al. (2017)), and how we apply it to the GFW data, and any deviations made from the published methedologies. The validation section below describes several alternative model specifications.
From a high-level, we calculate emissions as follows:
For each individual AIS message (ping), we calculate the main engine use, auxiliary engine use, and boiler use, each of which is a function of vessel characteristics, speed, and the time since the previous ping.
Using emissions factors (EFs) for the main, auxiliary, and boiler engines for seven pollutants (CO2, CH4, NOX, SOX, CO, N2O, and PM), we calculate the emissions of each pollutant for each individual AIS ping for both the main, auxiliary, and boiler engines.
For each pollutant and each AIS ping, we sum the emissions across the main, auxiliary, and boiler engines to get the ping-level emissions for each of these seven pollutants.
For three additional pollutants (PM2.5, PM10, and VOCs), we multiply the CO2 emissions by a conversion factor to get the ping-level emissions for each of these three pollutants.
With ping-level emissions, we are then able to aggregate emissions by vessel, by voyage, by port stay, by time, by space, etc.
1.1.1.1 Main engine
1.1.1.1.1 Main engine energy use
Based on Faber and Xing (2020) (page 64), main engine energy use (in killowatt-hours) is calculated as follows:
\[
\text{Main Engine Energy Use}_{kWh} = \text{Hours} \times \text{Load Factor} \times \text{Main Engine Power}_{kW}
\] Where hours comes from each individual AIS message, main engine power (kW) comes from the vessel characteristics dataset, and the load factor comes from the product of several correction factors (CFs):
\[
\text{Load Factor} = \text{Speed-power CF} \times \text{Hull Fouling CF} \times \text{Weather CF} \times \text{Draft CF}
\] Where: - hull fouling CF is 1.07, reflecting a 7% increase in resistance as described in Olmer et al. (2017) and Faber and Xing (2020) (see page 17 and Annexes page 270, respectively) - weather CF is a correction factor based on weather conditions, varying with the distance to shore. This factor is set at 1.1 for nearshore activity (≤5 nm from shore) to account for a 10% increase in resistance, and 1.15 for offshore activity (>5 nm), reflecting a 15% increase in resistance, as described in the Olmer et al. (2017) and Faber and Xing (2020) (see pages 18 and 270, respectively). - draft CF is extracted from the average draught by sector as reported in Olmer et al. (2017) (see Table 13 on page 20). Weights were applied by vessel type based on fuel consumption data from the Faber and Xing (2020) (see Annex 1, Figure 4), since fuel consumption values by type are proportionally related to emissions. This weighted average provided a final estimate of 0.85. Note: this factor could be refined using vessel class-specific averages. - speed-power CF is defined as \((\text{speed}_{knots} / {\text{design\_speed}_{knots}}) ^3\), with the additional stipulation that this ratio should not exceed 1:
\[ \text{Speed-power CF} = \begin{cases} 1 & \text{if } \frac{\text{speed}_{knots}}{\text{design\_speed}_{knots}} > 1, \\ \frac{\text{speed}_{knots}}{\text{design\_speed}_{knots}} & \text{otherwise} \end{cases} \]
In this equation, speed is derived from AIS-broadcasted speed measurements when there has been <= 1 hour since the previous AIS message (we therefore assume that all activity within the past hour is traveling at a similar speed). When it has been more than 1 hour since the previous message, the implied speed is used as a more accurate measurement of speed for that time period. This is calculated as distance from last position divided by hours since last position. Design speed for each vessels was estimated using a random forest regression trained using known registry design speed for a subset of vessels alongside other vessel characteristics including main engine power (\(ME\), kW) and gross tonnage (\(GT\)). This approach was applied to the entire GFW dataset of vessels.
\[ \text{design\_speed}_{knots} = (3.390 \times 10^{-4}) \cdot ME + (2.151 \times 10^{-5}) \cdot GT - (2.742 \times 10^{-9}) \cdot ME \cdot GT + 12.93 \]
As a last step, we ensure that the final load factor (the product of the above correction factors) does not exceed a value of 0.98, as recommended by Faber and Xing (2020) (see page 272).
\[ \text{Load Factor} = \begin{cases} 0.98 & \text{if (Load Factor)} > 0.98, \\ \text{Load Factor} & \text{otherwise} \end{cases} \] ###### Adjustments for fishing vessels
For fishing vessels, we adjust the main engine load factor based on the relationships published by Coello et al. (2015) (and later used by Sala et al. (2018)).
- For fishing vessels of class
trawlersanddredge_fishing, when they are actively fishing we assign a main engine load factor of 0.75. The intuition is that for these vessel types, even if they are moving slowly, their engines can be exerting tremdeous power while they are actually fishing with depolyed gear. - For all fishing vessels, we limit the main engine load factor so that it falls between 0.2 and 0.9
1.1.1.1.2 Main engine emissions
Main engine emissions for each pollutant (Table 1.1) is determined by multiplying each pollutant’s emissions factor (EF) by the main engine energy use. Main engine pollutant emission factors are derived from Appendix E in Olmer et al. (2017). For each pollutant, we use the average emissions factor for slow-speed, medium-speed, and high-speed diesel engines (SSD/MSD/HSD). SSD/MSD/HSD engines represent ~98% of vessels (Table 10 of Faber and Xing (2020)). The main engine emissions factors used for each pollutant are as follows:
Additionally, we apply a low-load correction factor based on the IMO Fourth GHG Report Table 20. Engines operating at very low loads below 20% operate inefficiently, and emit more of certain pollutants. Table 20 provides low-load correction factors which vary based on the exact engine load and for each pollutant. For each AIS ping, we therefore multiply the main engine emissions by this correction factor based on the engine load and pollutant. Main engine loads >20% do not get any low-load correction factor applied.
1.1.1.2 Auxilliary engine
1.1.1.2.1 Auxilliary engine energy use
Our initial modeling approach simplified the Faber and Xing (2020) recommendations by capturing the differences in auxiliary engine power consumption based on the vessel’s status, distinguishing between stationary and at-sea conditions. Speeds below 0.5 knots were considered stationary.
\[
\text{Aux engine Energy Use}_{kWh} = \text{hours} \times
\begin{cases}
\text{aux\_0sp}_{kW} & \text{if } \text{speed}_{knots} \leq 0.5, \\
\text{aux\_atsea}_{kW} & \text{otherwise.}
\end{cases}
\] Here, auxiliary engine power terms (\(\text{aux\_0sp}_{kW}\) and \(\text{aux\_atsea}_{kW}\)) were simplified from the ICCT and IMO models (which uses 4 operational phases — maneuvering, anchor, and berth, see Table 17 of the Faber and Xing (2020)). We did so by averaging auxiliary engine and boiler power for cruising and maneuvering into atsea and averaging auxiliary and boiler power for anchor and berth into 0sp. The rationale behind was that most vessels spend most of their time cruising and at ~0 speed.
While such simplification helped streamline the modeling process, it operated under the assumption of equal auxiliary emissions across vessel types and distinct operational phases, not fully capturing vessel behavior in terms of emissions. Therefore, after obtaining good validation results in the initial phase of model testing, we refined this approach to more closely follow the Faber and Xing (2020).
Below, we describe the methods followed to include disaggregate auxiliary engine and boiler power demands under four operational phase
1.1.1.2.2 Auxilliary engine and boiler energy use (4 phases)
The model described in Faber and Xing (2020), assumes that while in service, a ship is operating in one of four defined phases: at berth, at anchor, maneuvering, or at sea.
For small vessels, we follow the recommendations from the 4th IMO study (page 68, Faber and Xing (2020)), where auxiliary engine power and boiler power are relative to main engine power. For larger vessels, aux_engine_power_kw and boiler_power_kw are defined based on vessel class and operational phase.
\[ \text{Aux engine Energy Use}_{kWh} = \text{hours} \times \begin{cases} 0 & \text{if } \text{main\_engine\_power\_kw} \leq 150 \\ 0.05 \times \text{main\_engine\_power\_kw} & \text{if } \text{main\_engine\_power\_kw} \leq 500 \\ \text{aux\_engine\_power\_kw} & \text{otherwise} \end{cases} \]
\[ \text{Boiler Energy Use}_{kWh} = \text{hours} \times \begin{cases} 0 & \text{if } \text{main\_engine\_power\_kw} \leq 150 \\ \text{boiler\_power\_kw} & \text{otherwise} \end{cases} \]
The inclusion of the four phases for larger vessels requires the use of Table 17 from the Faber and Xing (2020), including energy demand for the auxiliary engine and the boiler. However, this table expresses power demand based on vessel tonnage in different units. Since GFW has vessel size in GT, we needed to convert some of the values represented in DWT, TEU, and CBM to GT. Here, we present the approach followed to establish a direct size units relationship by vessel category.
1.1.1.2.2.1 DWT conversion
To establish the GT-DWT relationship, we used data containing both GT and DWT for each vessel. By assessing the relationship between these units, which mostly present linear relationships by vessel type, we defined a simple regression allowing us to derive conversion expressions with sufficient confidence from one unit to the other.
Such data was obtained through web scraping from open online sources, containing information for 464799 vessels on variables such as type, gt, dwt, length_m, beam_m, through which we could draw the size units relationship by vessel class.
Out of all vessel types, we only need to evaluate the tonnage relationship for a few vessel types, the ones included in Table 17 from Faber and Xing (2020).
$ship_type
[1] "Bulk carrier" "Chemical tanker" "General cargo"
[4] "Oil tanker" "Other liquids tanker" "Refrigerated bulk"
[7] "Ro-Ro"
In order to properly establish the size relationship, we need to group the categories from our dataset so they match the categories from Table 17. By doing so, we can evaluate each category’s relationship.
For instance, for Bulk carriers we have 7 categories which, according to Figure 1.1, present a linear relationship.
The same occurs for chemical tankers with 5 categories, as shown in Figure 1.2.
For general cargo, we have 3 categories. One of them, Passenger/General Cargo Ship, as seen in Figure 1.3, deviates from the linearity and may fall within the Ferry-pax only category from Table 17, so we will discart it.
Related to oil tankers, several vessel categories contain the label oil. However, most of them actually belong to chemical or bulk carriers. In this grouping, we will exclusively include crude oil tankers and bitumen Tankers.
For the remaining liquid carriers, we will assign “Water Tanker”, “Wine Tanker” and “Molasses Tanker” from our table to the same category (Figure 1.5).
For refrigerated bulk, we have 2 categories, as shown in Figure 1.6.
Lastly, for Ro ships, we have 3 categories, following distinct relationships as shown in Figure 1.7. Only Ro-Ro Cargo ships are the ones we are interested in, as the other two fall within the Ferry-RoPax category from Table 17.
With the defined equivalences between groups from Faber and Xing (2020) and groups from our dataset described above, we will update the original dataframe to adjust the regressions. By fitting gross tonnage (GT) based on deadweight tonnage (DWT) and grouped type, we obtain the expressions explaining the relationship between both size units per vessel type, along with the performance metrics summarized in Table Table 1.2. We’ll save this expression in a lm object and used it later to update table 17.
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.9935724 | 0.9935717 | 2496.954 | 1325496 | 0 | 7 | -554797.7 | 1109613 | 1109694 | 374236342508 | 60024 | 60032 |
1.1.1.2.2.2 TEU conversion
TEU stands for Twenty-foot Equivalent Unit, which is a standard unit of measure used in the shipping industry to describe the capacity of container ships and terminals. One TEU represents the dimensions of a standard 20-foot long container.Therefore, TEU is used to quantify cargo capacity in terms of the number of 20-foot containers a vessel can carry. For example, a ship with a capacity of 10,000 TEU can carry 10,000 standard 20-foot containers.
For TEU, we will obtain GT equivalents based on the design formulas for the calculation of key design vessel characteristics from Abramowski, Cepowski, and Zvolenskỳ (2018) as detailed below:
\[ GT = -1097.4+11.049·TEU \]
1.1.1.2.2.3 CBM conversion
The size units of liquefied tankers represented as “CBM” refer to cubic meters (m³). This measurement indicates the volume capacity of the tankers, specifically how much liquefied gas (such as liquefied natural gas, LNG, or liquefied petroleum gas, LPG) they can carry.
For this unit conversion, we have not been able to find any large dataset to establish linear relationships, nor any publication defining expressions for unit conversion. The only available resource is the information from 23 vessels containing GT and CBM values, which allows to define a basic regression. This will establish the GT equivalence with intermediate to low confidence to update table 17 for gas tankers. This is a point for improvement, but for now, it will suffice.
After testing, we have seen how we can obtain a slightly better adjustment if we distinguish between LPG and LNG. However, Table 17 groups them together under the same category, so we will establish GT exclusively as a function of CBM (Table 1.3).
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.982426 | 0.9815891 | 7253.543 | 1173.945 | 0 | 1 | -236.0421 | 478.0841 | 481.4906 | 1104891522 | 21 | 23 |
1.1.1.2.2.4 Updating table 17
Once we have defined the relationship between GT and the other size units for each vessel class, we will update Table 17 from (Table 1.4). This will allow us to incorporate the auxiliary engine and boiler power outputs per ship class and operational mode into our AIS-based model.
| ship_type | size_lower | size_upper | size_units | size_lower_gt | size_upper_gt |
|---|---|---|---|---|---|
| Bulk carrier | 0 | 9999 | dwt | 0.000 | 7738.351 |
| Bulk carrier | 10000 | 34999 | dwt | 7738.864 | 20567.005 |
| Bulk carrier | 35000 | 59999 | dwt | 20567.518 | 33395.658 |
| Bulk carrier | 60000 | 99999 | dwt | 33396.171 | 53921.504 |
| Bulk carrier | 100000 | 199999 | dwt | 53922.017 | 105236.119 |
| Bulk carrier | 200000 | NA | dwt | 105236.632 | NA |
1.1.1.2.2.5 Updating ship types to GFW classes
Our model runs using GFW vessel information tables. With the updated Table 17 including size ranges in GT, we now need to update the ship types in that table and group them according to the vessel classes available in GFW datasets. To do this, a preliminary analysis of IHS vessel types found in the GFW classes was conducted. While some could be directly assigned based on GFW classification criteria, others, such as some cargo and tankers, had to undergo a more thorough process. By evaluating the relationship between size and auxiliary power for three IHS types, we generated a regression best each types. This helped us disaggregate vessel class composition in the GFW dataset and obtain the approximate percentage composition of IHS vessel types represented within GFW vessel groups Table 1.5.
| gfw_vessel_class | tb17_ship_type | weights |
|---|---|---|
| tug | Service - tug | 1.0000000 |
| cargo | General cargo | 0.3200000 |
| cargo | Bulk carrier | 0.4700000 |
| cargo | Container | 0.1100000 |
| cargo | Ro-Ro | 0.0500000 |
| cargo | Vehicle | 0.0500000 |
| cargo.bulk_carrier | Bulk carrier | 1.0000000 |
| cargo.container | Container | 1.0000000 |
| cargo.general | General cargo | 1.0000000 |
| cargo.refrigerated | Refrigerated bulk | 1.0000000 |
| cargo.ro_ro | Ro-Ro | 1.0000000 |
| bunker | Chemical tanker | 1.0000000 |
| reefer | Refrigerated bulk | 1.0000000 |
| tanker | Oil tanker | 0.4500000 |
| tanker | Chemical tanker | 0.3400000 |
| tanker | Other liquids tanker | 0.0500000 |
| tanker | Liquefied gas tanker | 0.1600000 |
| tanker.chemical_oil | Oil tanker | 0.5696203 |
| tanker.chemical_oil | Chemical tanker | 0.4303797 |
| tanker.liquefied_gas | Liquefied gas tanker | 1.0000000 |
| tanker.other | Other liquids tanker | 1.0000000 |
| fishing | Miscellaneous - fishing | 1.0000000 |
| seiners | Miscellaneous - fishing | 1.0000000 |
| research | Service - other | 0.5000000 |
| research | Offshore | 0.5000000 |
| trawlers | Miscellaneous - fishing | 1.0000000 |
| trollers | Miscellaneous - fishing | 1.0000000 |
| passenger | Ferry-RoPax | 0.3300000 |
| passenger | Ferry-pax only | 0.3300000 |
| passenger | Cruise | 0.3300000 |
| well_boat | Miscellaneous - fishing | 1.0000000 |
| fixed_gear | Miscellaneous - fishing | 1.0000000 |
| dive_vessel | Miscellaneous - fishing | 1.0000000 |
| non_fishing | Miscellaneous - other | 0.5000000 |
| non_fishing | Yacht | 0.5000000 |
| fish_factory | Miscellaneous - other | 1.0000000 |
| other_seines | Miscellaneous - fishing | 1.0000000 |
| purse_seines | Miscellaneous - fishing | 1.0000000 |
| set_gillnets | Miscellaneous - fishing | 1.0000000 |
| squid_jigger | Miscellaneous - fishing | 1.0000000 |
| patrol_vessel | Service - other | 0.5000000 |
| patrol_vessel | Offshore | 0.5000000 |
| pole_and_line | Miscellaneous - fishing | 1.0000000 |
| set_longlines | Miscellaneous - fishing | 1.0000000 |
| supply_vessel | Service - other | 0.5000000 |
| supply_vessel | Offshore | 0.5000000 |
| dredge_fishing | Miscellaneous - fishing | 1.0000000 |
| pots_and_traps | Miscellaneous - fishing | 1.0000000 |
| seismic_vessel | Service - other | 0.5000000 |
| seismic_vessel | Offshore | 0.5000000 |
| cargo_or_reefer | General cargo | 1.0000000 |
| cargo_or_tanker | General cargo | 1.0000000 |
| bunker_or_tanker | Chemical tanker | 1.0000000 |
| container_reefer | Refrigerated bulk | 1.0000000 |
| other_not_fishing | Miscellaneous - other | 1.0000000 |
| tuna_purse_seines | Miscellaneous - fishing | 1.0000000 |
| dredge_non_fishing | Miscellaneous - fishing | 1.0000000 |
| drifting_longlines | Miscellaneous - fishing | 1.0000000 |
| other_purse_seines | Miscellaneous - fishing | 1.0000000 |
| specialized_reefer | Refrigerated bulk | 1.0000000 |
| fish_tender | Miscellaneous - fishing | 1.0000000 |
| driftnets | Miscellaneous - fishing | 1.0000000 |
| other_fishing | Miscellaneous - fishing | 1.0000000 |
Using the resulting weights, we can more accurately combine IHS groups into GFW classes and obtain auxiliary and boiler power estimates for each operational phase, based on weighted means for each overlapping size range. The resulting table is stored in BQ under world-fishing-827.proj_ocean_ghg.aux_and_boil_power_by_operational_mode.
With it, we updated world-fishing-827.proj_ocean_ghg.vessel_info by including the corresponding boiler and auxiliary engine power demand considering vessel size and class for each of the four operational phases. This can then be fed into the model to provide more accurate emission estimates.
1.1.1.2.3 Auxilliary engine and boiler emissions
Auxiliary engine and boiler emissions (Table 1.6) are determined by multiplying each pollutants emissions factor by the auxiliary engine or boiler energy use. We calculate emissions for each pollutant using emissions factors derived from Appendix G and H from Olmer et al. (2017). For each pollutant, we use the average emissions factor for slow-speed, medium-speed, and high-speed diesel engines (SSD/MSD/HSD). SSD/MSD/HSD engines represent ~98% of vessels (Table 10 of Faber and Xing (2020)). The auxiliary engine and boiler emissions factors used for each pollutant are as follows:
| pollutant | aux_ef_g_per_kwh |
|---|---|
| CH4 | 0.010 |
| CO | 0.540 |
| CO2 | 699.667 |
| N2O | 0.033 |
| NOX | 12.116 |
| PM | 0.610 |
| SOX | 4.337 |
\[ \text{Aux engine emissions}_{g} = \text{hours} \times \begin{cases} \text{aux\_at\_berth}_{kW} \\ \text{aux\_at\_anchor}_{kW} \\ \text{aux\_maneuvering}_{kW} \\ \text{aux\_at\_sea}_{kW} \end{cases} \times \text{Aux emissions factor}_{g/kWh} \]
| pollutant | boiler_ef_g_per_kwh |
|---|---|
| CH4 | 0.002 |
| CO | 0.200 |
| CO2 | 958.000 |
| N2O | 0.043 |
| NOX | 2.033 |
| PM | 0.380 |
| SOX | 5.827 |
\[ \text{Boiler emissions}_{g} = \text{hours} \times \begin{cases} \text{boiler\_at\_berth}_{kW} \\ \text{boiler\_at\_anchor}_{kW} \\ \text{boiler\_maneuvering}_{kW} \\ \text{boiler\_at\_sea}_{kW} \end{cases} \times \text{Boiler emissions factor}_{g/kWh} \]
1.1.1.3 Total emissions
Finally, using the factors above, we estimate total emissions by multiplying Main Engine Energy Use, Aux Engine Energy Use and Boiler Energy Use by their respective emissions factors. We then sum the three values to get the total emission estimate.
\[
\text{Total Emissions}_{ CO_2, NO_X, ...} =\text{Main Engine Emissions}_{ CO_2, NO_X, ...} + \text{Aux Engine Emissions}_{ CO_2, NO_X, ...} + \text{Boiler Emissions}_{ CO_2, NO_X, ...}
\]
1.1.1.4 Additional pollutants
For three additional pollutants (PM2.5, PM10, and VOCs), we multiply the CO2 emissions by the conversion factors in the following table. These conversion factors were provided by OceanMind, and thus our methodology is consistent wth their approach for these pollutants (which is detailed in Mayes et al. (2024)). To summarize, they are based off Table 27 from the 4th IMO report (Faber and Xing (2020)), which shows the emissions factors for each pollutant and fuel type (in terms of kg of pollutant per tonne of fuel consumed). The fuel types included are Heavy Fuel Oils (HFO), Liquefied Natural Gas (LNG), Marine Diesel Oil (MDO), and methenol. Using the 2018 factors, the emissions factors for PM2.5, PM10, and VOCs were each divided by the corresponding CO2 emissions factor, by fuel type, in order to convert between emissions of CO2 to emissions of these other pollutants, by fuel type. These emissions factor ratios were then used to calculate a a weighted average emissions factor for each pollutant, where the weighting was done by the average percentage of vessels that use each fuel type (which are taken as the 2018 values from Table 34 of the IMO report). This results in the following factors that we directly use:
| gPM2.5/gCO2 | gPM10/gCO2 | gVOCS/gCO2 |
|---|---|---|
| 0.001598 | 0.001738 | 0.000933 |
1.1.2 Low sulfur fuel emissions correction factors for post-2020 data
Starting on January 1, 2020, the IMO required lower sulfur content fuel (0.5%, instead of the higher 2.5% that was typical for HDO prior to 2020). For data >= 2020-01-01, we therefore apply a new correction factor for SOX and PM to account for this lower sulfur content fuel.
For SOX: based on equation 15 (p. 74) of the 4th IMO report, the SOX emissions factor scales linearly with sulfur percentage content. For the pre-2020 SOX EF, we use Table E from the 2017 ICCT study, and take the average of the HFO value which is based on a 2.5% sulfur content (which is consistent the HFO row in Table 22 from the 4th IMO report). Assuming the sulfur content drops from 2.5% to 0.5% starting in 2020, this means an 80% drop in sulfur content, and an 80% drop in the sulfur EF. This 80% is consistent with several studies that looked the the impact this new requirement had on global sulfur emissions (@yuan2024abrupt, @yoshioka2024warming). So for the >= 2020-01-01 SOX EF, we can simply multiply our current < 2020-01-01 EF by 0.2. We can do this across the three main, aux, and boiler EFs.
For PM, PM2.5, and PM10: Equation 16 (p.74) of the 4th IMO report defines the PM10 EF relationship to sulfur content for HFO. This equation isn’t a perfect scalar factor like SOX, and depends on SFCi and has a constant. Based on Table 19 (p. 70) from the 4th IMO report, the average SFCi of HDO is 185 g/kWH (175 for SSD, 185 for MSD, and 195 for HSD). If we plug this into Equation 16, we get a < 2020-01-01 EF of 73.38 assuming a sulfur content of 2.5% (1.35+185*7*.02247*(2.5-.0246)) Assuming a post-2020 sulfur content of 0.5%, we get a >= 2020-01-01 EF of 15.18 (1.35+185*7*.02247*(0.5-.0246) ). The ratio of these two is 15.18/73.38 = 0.206, which is almost exactly the ratio for sulfur. So for the >= 2020-01-01 EFs for PM PM10 and PM2.5, we simply multiply our current < 2020-01-01 EF by 0.206. We can do this across the three main, aux, and boiler EFs of each pollutant.
1.1.3 Data
1.1.3.1 Individual AIS messages
For our individual AIS messages (pings) dataset, we leverage the latest-and-greatest version of the GFW’s AIS pipeline, Version 3. This is one of the GFW’s core internal datasets. This process automates the parsing, cleaning, augmenting, and publishing of raw AIS data (Kroodsma et al. (2018)). This table provides data from 2012 to present. Using this table as our starting point, we are able to estimate emissions from all analyzed pollutants for every single AIS message. These ping-level emissions can then later be aggregated however desired (e.g., by vessel, by voyage, by destination or arrival port, by time, by space, etc.)
Variables of interest within this table include the following:
ssvid: source specific vessel id; MMSI for AIS
hours: time since the previous position in the segmentspeed_knots: speed (knots) from AIS messageimplied_speed_knots: distance from last position divided by hours since last position
meters_to_prev: distance (meters) to the previous point in the segmentdistance_from_shore_m: distance from shore (meters)distance_from_port_m: distance from port (meters)neural net score: The score is 1 if the neural net thinks this is a fishing position.night_loitering: 1 if theseg_idof every message of a squid_jigger that is at night and not moving, 0 if not.
In order to minimize noisy data, we only include AIS messages that occur within valid segments (i.e., select seg_id from pipe_ais_v3_published.segs_activity where good_seg), and only also within daily segments that are not overlapping with each other (i.e., those that do not occur in overlap_segs_daily_v20241202).
1.1.3.2 Vessel characteristics
Vessel characteristics also represent another one of the core GFW datasets. These tables provides metadata for all vessels contained within GFW, organized by MMSI. The information for each vessel includes: 1) official registry information, when available (Park et al. (2023)); or 2) algorithm-derived vessel characteristics such as vessel class, engine power, and gross tonnage, when registry data are not available (Kroodsma et al. (2018)). The GFW vessel characteristics database leverages extensive work that has been done to scrape and aggregate many publicly available vessel registries (Park et al. (2023)). Note that we are currently using a cutting edge version of this database, which differs from Version 3 of the pipeline in that it uses a new experimental random forest algorithm for inferring certain vessel characteristics when they are not available in official vessel registries (vessel type, main engine power, length, gross tonnage, and max speed).
We are also leveraging two brand new cargo and tanker vessel type classification sub-models developed by GFW that build off the general vessel classification algorithm for low information vessels (i.e., those that don’t have known registry information). We can now differentiate many vessels that were previously lumped together as an undifferentiated cargo vessel type into the specific categories of bulk_carrier, container, general, refrigerated, and ro_ro. We can also now differentiate many vessels that were previously lumped together as an undifferentiated tanker vessel type into the specific categories of oil or chemical, liquefied gas, and other liquids. These classes now align with the IMO cargo vessel class types, allowing us to more accurately assign auxiliary engine power and boiler power for these vessel classes using the IMO methodology. These new models each leverage a random forest that is trained on information on port visit patterns vessels with known IMO cargo or tanker vessel class types. Sequences of port visits are converted into usable model features by implementing a Word2Vec model that transforms them into numeric arrays called embeddings. Additionally, each model also uses model features based on vessel activity including port hours, average distance from shore, and average speed. This allows us to more accurately classify cargo or tanker vessel types by looking at the most common IMO known vessel types that use the same ports. The cargo sub-model achieves am F1 weighted average score of 91%, while the tanker sub-model achieves am F1 weighted average score of 95%.
Variables of interest from pipe_ais_v3_published.vi_ssvid_v20250201 (the core GFW pipeline 3 vessel characteristics table) include the following:
ssvid: source specific vessel id; MMSI for AISbest.flag: best flag state (ISO3) for the vesselactivity.active_hours: hours the vessel was broadcasting AIS and moving more than 0.1 knots. If desired, we can use this as a filter; vessels with < 24 hours of active hours have very limited data from which to calculate emissions from.registry_info.registries_listed: vessel registries the vessel is listed onregistry_info.best_known_shipname: best known shipname for the vessel from registriesais_identity.n_shipname_mostcommon.value: the most common normalized shipname broadcasted by this vesselregistry_info.best_known_callsign: best known callsign for vessel from registriesais_identity.n_callsign_mostcommon.value: the most common normalized callsign broadcasted by this vesselregistry_info.best_known_imo imo_registry: best known IMO number for the vessel from registriesais_identity.n_imo_mostcommon.value imo_ais: the most common normalized IMO number broadcasted by this vesseloffsetting: true if this vessel has been seen with an offset position at some point between 2012 and 2019 (this should be FALSE; if it is TRUE, it can be used as a filter to remove potentially erroneous/noisy vessels)overlap_hours_multinames: the total numbers of hours of overlap between two segments where, over the time period of the two segments that overlap (including the non-overlapping time of the segments), the vessel broadcast two or more normalized name, where each normalized name was broadcast at least 10 or more times. That is a bit complicated, but the goal is to identify overlapping segments where there were likely more than one identity. (this should be 0; if it is > 0, it can be used as a filter to remove potentially erroneous/noisy vessels)
Variables of interest from proj_ocean_ghg.rf_predictions_v20250613 (the new experimental random forest vessel characteristics table developed for this project) include the following:
ssvid: source specific vessel id; MMSI for AISrf_best_vessel_class: best vessel class for the vessel (using official registry information where available; or the random forest vessel characteristics algorithm where registry information is not available. For cargo or tanker vessels where registry information is not available, we use the new cargo and tanker vessel specific classification sub-model.)rf_best_engine_power_kw: best engine power (kilowatts) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)rf_best_tonnage_gt: best tonnage (gross tons) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)rf_best_length_mbest length (meters) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)rf_best_max_speed_knotbest maximum speed (i.e., design speed) (knots) for the vessel (using official registry information where available, or the random forest vessel characteristics algorithm where not available)on_fishing_list_rf_best: GFW determination of whether the vessel is a fishing vessel, using information from vessel registries and the random forest model
Variables of interest from proj_ocean_ghg.cargo_subtypes_v20250613 (the new experimental cargo and tanker sub-model) include the following:
ssvid: source specific vessel id; MMSI for AISvessel_subclass: Cargo or tanker sub-class, where available
There are currently 1,067,985 AIS-broadcasting vessels in the GFW dataset (this number excludes any AIS transponders which are labeled as fishing gear, helicopters, or submarines). Of these, we are able to estimate emissions 925,682 unique active vessels over our entire time period. Of these, 901,214 (97%) are ‘low-information’ vessels without an IMO number.
1.1.3.3 Voyages
Again leveraging Version 3 of the pipeline, GFW’s voyages table contains information for port-to-port voyages made by vessels. This table leverages extensive work done by the GFW team to: 1) define ports, 2) determine when vessels arrive at or depart from a port, and 3) determine voyages that are define by a port departure and a port arrival (Watch (2021)).
To define ports, specific anchorages are first identified by using the AIS data to find S2 cell locations where at least 20 unique vessels remained stationary at some point since 2012 (where ‘stationary’ is defined as moving less than 0.5km within a 12-hour period). Once these initial anchorage locations have been identified, anchorages within 4km of each other are grouped into ports. In this way, a single port may contain multiple anchorages within that port. Port names are then assigned to each of these locations spatially according to the following heirarchy:
- World Port Index
- GeoNames 1000 database that describes all settlements globally that have a population of at least 1,000 people
- The top destination reported in AIS messages of stationary vessels that defined that anchorage
- Contributed names and regional port databases
Once ports have been identified, heuristics are used to identify port entries and exits:
A vessel enters port when it comes within 3 kilometers of an anchorage point and exits port when it is outside 4 kilometers of the anchorage point. We use different threshold distances to avoid situations where a vessel continuously enters and exits port. This situation is still common, however, as vessels travel along coastlines and repeatedly come within close proximity to numerous anchorages. To distinguish actual port visits from coastal transits, we further identify when a vessel appears to stop at a given port. The vessel is considered to have “stopped” at port if its speed drops below 0.2 knots, and this port stop ends when the speed rises above 0.5 knots. AIS is often switched off when a vessel enters port, and it is turned back on when it leaves. As a result, we track port “gaps,” where a vessel that has entered port does not broadcast on AIS for at least four hours. Port stops and port gaps are behaviors indicating that a vessel visited a port for a specific reason and/or engaged in some activity while at the port, such as landing catch or exchanging supplies and crew. We can then allocate the at-sea activity of vessels to individual voyages between port visits.
Once these port entries and exits have been identified, voyages can simply be defined as all activity between those port events. In this way, individual AIS message (and its associated emissions) can be assigned to a voyage.
Variables of interest within this table include the following:
ssvid: source specific vessel id; MMSI for AIStrip_id: A unique identifier for the trip generated by the ssvid, vesssel-id and the exit time of starting visittrip_start: The initial timestamp of the voyage, when the vessel leaves porttrip_end: The final timestamp of the voyage, when the vessel reaches porttrip_start_anchorage_id: The id of the anchorage where the voyage startstrip_end_anchorage_id: The id of the anchorage where the voyage ends
Further information of the starting and ending anchorages can be obtained by joining this table to GFW’s anchorage table, which includes the following:
anchorage_id: The id of the anchoragelabel: Port nameiso3: Port ISO3 codelat: latitude of the anchoragelon: longitude of the anchorage
In summary, from 2015-01-01 to 2025-06-30, we have 145,747,116 unique voyages across 805,021 unique vessels. These trips visited 14,682 unique ports across 209 unique countries.
1.1.3.4 Port visits
Again leveraging Version 3 of the pipeline, we use GFW’s port visits table. Port visits are determinede using the same methods as describe above for assigning voyages. Variables of interest within this table include the following:
ssvid: source specific vessel id; MMSI for AISvisit_id: Unique ID for this visitstart_timestamp: timestamp at which vessel crossed into the anchorageend_timestamp: timestamp at which vessel crossed out the anchoragestart_anchorage_id:anchorage_idof anchorage where vessel entered portend_anchorage_id:anchorage_idof anchorage where vessel exited portconfidence: How confident are we that this is a real visit based on components of the visits: 1 -> no stop or gap; only an entry and/or exit 2 -> only stop and/or gap; no entry or exit 3 -> port entry or exit with stop and/or gap 4 -> port entry and exit with stop and/or gap
For quality control purposes, we filter this dataset to just those port visits with the highest confidence level, 4. We also only filter to those port visits where the starting and ending port label are the same.
As with the voyages dataset, for information of the port starting and ending anchorages can be obtained by joining this table to GFW’s anchorage table. Variables of interest within this table include the following:
anchorage_id: The id of the anchoragelabel: Port nameiso3: Port ISO3 codelat: latitude of the anchoragelon: longitude of the anchorage
Note again that since a single port can have multiple anchorages, it is possible that a single port visit has different starting and ending anchorages, and therefore lat/long locations.
In summary, from 2015-01-01 to 2025-06-30, we have 109,617,280 unique port visits across 835,196 unique vessels. These port visits trips occurred in 14,431 unique ports across 209 unique countries.
1.1.4 Areas of potential model refinement
We have identified a number of areas for potential model refinement. They are all related to the need for improved vessel characteristics metadata:
- Vessel classification: Continue to align GFW vessel classes with IHS vessel classes: Some GFW and IHS vessel classes are currently categorized slightly differently (for example, those related to tankers), meaning that we need to translate and aggregate certain information that is provided by the ICCT and IMO for IHS vessel classes (i.e., auxiliary engine power by vessel type) into the GFW vessel classes.
- Draft correction factor: Currently, we use the same draft correction factor for all vessels. This single draft correction factor is currently an average of vessel class-specific correction factors, weighted by the total emissions by each vessel class. Future model iterations may want to use vessel class-specific draft factors.
- Size units conversion: The inclusion of the four operational phases requires the use of auxiliary engine and boiler energy demand values by vessel size. As described earlier, this entails setting unit conversion expressions that can be refined to better capture energy demand, especially those for CBM conversion.
1.2 Results
In this section, we provide some high-level results from our emissions model.
1.2.1 Time series trends
1.2.1.1 Number of vessels
First, we look at total global number of active vessels per year from 2015-2024 (Figure 1.9, Table 1.8).
| vessel_class | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
|---|---|---|---|---|---|---|---|---|---|---|
| passenger | 73,606 | 88,362 | 105,234 | 123,572 | 140,766 | 148,975 | 173,947 | 194,797 | 212,636 | 234,353 |
| cargo.general | 52,021 | 58,273 | 64,290 | 66,685 | 73,370 | 76,995 | 73,831 | 71,097 | 72,187 | 76,303 |
| trawlers | 39,025 | 44,761 | 49,719 | 54,491 | 56,954 | 58,413 | 60,890 | 62,518 | 66,732 | 66,157 |
| tug | 23,694 | 26,156 | 29,186 | 30,835 | 33,958 | 36,355 | 37,889 | 38,554 | 41,398 | 44,972 |
| other_fishing | 14,321 | 16,525 | 18,358 | 20,547 | 21,997 | 22,827 | 25,345 | 28,047 | 32,005 | 32,359 |
| tanker.chemical_oil | 16,291 | 16,650 | 17,699 | 20,205 | 21,302 | 19,726 | 20,335 | 21,430 | 21,341 | 23,950 |
| cargo.bulk_carrier | 13,832 | 14,571 | 15,410 | 15,514 | 16,695 | 16,822 | 16,778 | 17,248 | 17,628 | 18,750 |
| drifting_longlines | 2,343 | 3,008 | 3,693 | 4,240 | 4,795 | 5,543 | 7,143 | 8,731 | 9,328 | 9,685 |
| bunker | 3,804 | 4,100 | 4,269 | 4,704 | 5,221 | 5,735 | 5,467 | 5,486 | 7,696 | 8,400 |
| cargo.container | 4,981 | 4,946 | 5,279 | 5,474 | 5,726 | 5,829 | 6,162 | 6,650 | 6,907 | 7,462 |
| set_gillnets | 3,116 | 3,785 | 4,413 | 4,998 | 5,408 | 5,672 | 6,111 | 6,108 | 6,676 | 6,801 |
| patrol_vessel | 3,266 | 3,582 | 4,064 | 4,397 | 4,597 | 4,917 | 5,234 | 5,707 | 6,366 | 6,671 |
| cargo.ro_ro | 3,080 | 3,220 | 3,584 | 3,885 | 4,376 | 4,620 | 4,867 | 5,290 | 5,499 | 5,989 |
| pots_and_traps | 3,025 | 3,655 | 4,317 | 4,908 | 5,214 | 5,543 | 5,815 | 5,858 | 5,932 | 5,916 |
| supply_vessel | 3,822 | 3,661 | 3,534 | 3,567 | 3,738 | 3,766 | 3,819 | 4,089 | 4,303 | 4,438 |
| dredge_non_fishing | 2,054 | 2,263 | 2,564 | 2,640 | 2,723 | 2,860 | 3,007 | 3,070 | 3,288 | 3,411 |
| squid_jigger | 712 | 812 | 915 | 1,049 | 1,216 | 2,051 | 2,326 | 2,486 | 2,631 | 2,677 |
| tanker.liquefied_gas | 1,109 | 1,206 | 1,537 | 1,702 | 2,067 | 1,639 | 1,762 | 1,963 | 2,163 | 2,636 |
| set_longlines | 1,238 | 1,355 | 1,488 | 1,574 | 1,691 | 1,809 | 1,953 | 2,012 | 2,014 | 2,033 |
| other_not_fishing | 981 | 995 | 1,033 | 1,075 | 1,125 | 1,173 | 1,241 | 1,317 | 1,411 | 1,493 |
| seismic_vessel | 1,142 | 1,139 | 1,131 | 1,159 | 1,148 | 1,153 | 1,177 | 1,208 | 1,269 | 1,284 |
| specialized_reefer | 743 | 786 | 822 | 850 | 841 | 819 | 875 | 904 | 945 | 939 |
| tuna_purse_seines | 484 | 571 | 608 | 639 | 699 | 728 | 746 | 761 | 756 | 770 |
| dredge_fishing | 342 | 378 | 386 | 403 | 428 | 462 | 479 | 492 | 521 | 570 |
| pole_and_line | 367 | 402 | 428 | 458 | 488 | 531 | 536 | 541 | 549 | 547 |
| dive_vessel | 403 | 439 | 429 | 455 | 464 | 450 | 447 | 466 | 473 | 461 |
| research | 197 | 226 | 238 | 253 | 281 | 294 | 307 | 302 | 292 | 289 |
| trollers | 149 | 180 | 209 | 241 | 263 | 268 | 275 | 283 | 276 | 278 |
| well_boat | 132 | 151 | 171 | 172 | 179 | 191 | 196 | 196 | 199 | 197 |
| other_seines | 155 | 166 | 172 | 178 | 186 | 187 | 191 | 192 | 193 | 192 |
| reefer | 100 | 101 | 99 | 104 | 101 | 98 | 110 | 84 | 104 | 101 |
| container_reefer | 130 | 126 | 128 | 122 | 113 | 109 | 106 | 104 | 100 | 99 |
| tanker.other | 53 | 58 | 55 | 57 | 78 | 64 | 48 | 56 | 78 | 96 |
| cargo.refrigerated | 61 | 66 | 74 | 78 | 77 | 77 | 66 | 56 | 58 | 65 |
| other_purse_seines | 39 | 38 | 39 | 38 | 40 | 40 | 39 | 39 | 39 | 41 |
| fish_factory | 22 | 25 | 25 | 24 | 26 | 26 | 21 | 25 | 22 | 25 |
| bunker_or_tanker | 15 | 15 | 10 | 6 | 11 | 8 | 9 | 9 | 10 | 12 |
| driftnets | 3 | 3 | 1 | 4 | 4 | 4 | 4 | 4 | 3 | 3 |
1.2.1.2 Voyages
Next, we summarize the number of voyages, globally, for vessels included in our analysis from 2015-2024 (Figure 1.10, Table 1.9).
| year | n_unique_events |
|---|---|
| 2015 | 8,437,555 |
| 2016 | 9,663,540 |
| 2017 | 11,593,371 |
| 2018 | 12,391,954 |
| 2019 | 13,101,881 |
| 2020 | 12,244,422 |
| 2021 | 13,556,057 |
| 2022 | 14,544,323 |
| 2023 | 15,269,286 |
| 2024 | 16,711,258 |
1.2.1.3 Port visits
Here we summarize the time series trend of the number of port visits, globally, for the vessels included in our analysis from 2015-2024 (Figure 1.11, Table 1.10).
| year | n_unique_events |
|---|---|
| 2015 | 6,177,728 |
| 2016 | 7,161,050 |
| 2017 | 8,720,020 |
| 2018 | 9,271,336 |
| 2019 | 9,849,398 |
| 2020 | 9,304,535 |
| 2021 | 10,301,900 |
| 2022 | 11,176,546 |
| 2023 | 11,810,193 |
| 2024 | 13,021,187 |
1.2.1.4 Emissions
Next, we look at total annual global emissions (metric tonnes, MT) for each pollutant from 2015-2024 (Figure 1.12, Table 1.11).
| year | CO2 | NOX | SOX | CH4 | CO | N2O | PM | PM10 | PM2_5 | VOCS |
|---|---|---|---|---|---|---|---|---|---|---|
| 2015 | 718,722,836 | 12,625,854 | 4,449,919 | 11,869 | 591,766 | 35,229 | 653,820 | 1,330,165 | 1,223,017 | 809,033 |
| 2016 | 780,889,103 | 13,573,763 | 4,833,909 | 12,720 | 635,914 | 38,215 | 704,985 | 1,442,282 | 1,326,103 | 874,026 |
| 2017 | 846,501,499 | 14,529,395 | 5,238,889 | 13,572 | 680,694 | 41,352 | 757,432 | 1,560,024 | 1,434,361 | 941,607 |
| 2018 | 876,809,809 | 14,897,047 | 5,425,214 | 14,011 | 701,399 | 42,854 | 780,849 | 1,618,553 | 1,488,175 | 979,887 |
| 2019 | 899,209,924 | 15,139,576 | 5,562,484 | 14,423 | 718,367 | 44,028 | 798,821 | 1,666,298 | 1,532,073 | 1,015,842 |
| 2020 | 889,302,629 | 14,816,298 | 1,099,983 | 14,223 | 706,586 | 43,569 | 161,981 | 340,080 | 312,686 | 1,009,692 |
| 2021 | 965,566,238 | 15,989,880 | 1,194,266 | 15,130 | 757,245 | 47,143 | 174,550 | 367,212 | 337,632 | 1,079,482 |
| 2022 | 1,038,855,368 | 17,138,169 | 1,284,752 | 16,388 | 817,252 | 50,812 | 187,859 | 396,442 | 364,507 | 1,172,586 |
| 2023 | 1,085,212,897 | 18,017,603 | 1,342,122 | 17,532 | 866,996 | 53,299 | 197,934 | 416,908 | 383,325 | 1,247,819 |
| 2024 | 1,167,605,666 | 19,156,424 | 1,443,761 | 18,503 | 920,275 | 57,199 | 210,961 | 446,935 | 410,933 | 1,329,062 |
1.2.2 Emissions by vessel class
Next, for 2024, we summarize the global annual emissions by vessel class (Figure 1.13). These are summarized using the GFW vessel class categories. We first plot simply CO2 emissions, then look at a table of all pollutants.
We next can look at a table of all pollutants for 2024 (Table 1.12):
| vessel_class | CO2 | NOX | SOX | VOCS | CO | PM10 | PM2_5 | PM | N2O | CH4 |
|---|---|---|---|---|---|---|---|---|---|---|
| tanker.chemical_oil | 240,047,644 | 3,377,729 | 296,161 | 242,988 | 158,184 | 88,232 | 81,125 | 38,552 | 11,415 | 2,948 |
| passenger | 228,838,747 | 2,036,916 | 280,603 | 230,835 | 109,218 | 84,011 | 77,243 | 29,319 | 10,730 | 1,933 |
| cargo.container | 196,436,153 | 4,137,712 | 243,953 | 272,227 | 201,667 | 81,037 | 74,509 | 43,233 | 10,161 | 4,469 |
| cargo.bulk_carrier | 169,353,456 | 3,366,512 | 210,302 | 186,232 | 151,505 | 64,038 | 58,880 | 33,987 | 8,308 | 3,015 |
| cargo.general | 73,742,032 | 1,356,534 | 91,364 | 91,964 | 65,538 | 29,184 | 26,833 | 14,610 | 3,707 | 1,404 |
| tanker.liquefied_gas | 73,270,269 | 1,388,193 | 90,924 | 74,948 | 61,229 | 27,026 | 24,849 | 14,055 | 3,539 | 1,180 |
| trawlers | 48,463,405 | 989,333 | 60,218 | 54,239 | 47,185 | 18,466 | 16,979 | 9,954 | 2,394 | 901 |
| cargo.ro_ro | 42,925,936 | 807,143 | 53,251 | 45,564 | 36,209 | 16,032 | 14,741 | 8,268 | 2,088 | 712 |
| tug | 18,769,929 | 417,420 | 23,297 | 34,802 | 22,384 | 8,793 | 8,085 | 4,649 | 1,053 | 574 |
| bunker | 15,275,039 | 85,166 | 18,658 | 14,743 | 5,262 | 5,528 | 5,083 | 1,606 | 703 | 80 |
| other_fishing | 12,951,008 | 272,055 | 16,082 | 18,464 | 15,381 | 5,425 | 4,988 | 2,883 | 678 | 304 |
| supply_vessel | 7,578,618 | 156,834 | 9,405 | 11,480 | 7,971 | 3,243 | 2,981 | 1,699 | 402 | 187 |
| specialized_reefer | 7,313,700 | 119,952 | 9,052 | 7,314 | 5,496 | 2,678 | 2,462 | 1,286 | 351 | 104 |
| dredge_non_fishing | 4,124,723 | 82,553 | 5,120 | 5,179 | 3,908 | 1,636 | 1,505 | 862 | 209 | 84 |
| patrol_vessel | 3,827,480 | 75,893 | 4,748 | 5,274 | 3,750 | 1,575 | 1,448 | 816 | 198 | 85 |
| drifting_longlines | 3,820,624 | 79,854 | 4,745 | 5,200 | 4,367 | 1,570 | 1,443 | 837 | 198 | 86 |
| container_reefer | 3,698,132 | 70,144 | 4,591 | 3,632 | 3,066 | 1,346 | 1,237 | 705 | 178 | 58 |
| seismic_vessel | 2,776,295 | 60,538 | 3,447 | 4,644 | 3,126 | 1,240 | 1,140 | 659 | 151 | 77 |
| other_not_fishing | 2,580,742 | 35,281 | 3,178 | 3,384 | 1,900 | 1,041 | 957 | 441 | 130 | 42 |
| squid_jigger | 2,382,356 | 50,498 | 2,959 | 3,363 | 2,820 | 994 | 914 | 531 | 124 | 56 |
| tuna_purse_seines | 1,817,801 | 39,707 | 2,259 | 2,576 | 2,189 | 760 | 698 | 412 | 95 | 43 |
| cargo.refrigerated | 1,467,227 | 26,757 | 1,820 | 1,413 | 1,175 | 531 | 488 | 272 | 70 | 22 |
| set_gillnets | 1,145,545 | 24,347 | 1,423 | 1,671 | 1,393 | 485 | 446 | 258 | 60 | 28 |
| dive_vessel | 931,186 | 20,111 | 1,156 | 1,505 | 1,039 | 410 | 377 | 218 | 50 | 25 |
| set_longlines | 917,084 | 19,652 | 1,139 | 1,331 | 1,115 | 387 | 356 | 207 | 48 | 22 |
| pots_and_traps | 850,512 | 18,611 | 1,057 | 1,308 | 1,091 | 368 | 338 | 198 | 45 | 22 |
| pole_and_line | 444,109 | 9,336 | 551 | 630 | 526 | 186 | 171 | 99 | 23 | 10 |
| well_boat | 411,784 | 8,011 | 511 | 434 | 358 | 153 | 141 | 81 | 20 | 7 |
| research | 306,239 | 5,905 | 380 | 364 | 278 | 119 | 109 | 62 | 15 | 6 |
| reefer | 289,423 | 3,601 | 357 | 284 | 175 | 105 | 97 | 43 | 14 | 3 |
| tanker.other | 286,701 | 2,681 | 352 | 293 | 143 | 106 | 97 | 38 | 13 | 3 |
| dredge_fishing | 249,167 | 5,146 | 309 | 319 | 270 | 100 | 92 | 53 | 13 | 5 |
| other_seines | 142,725 | 3,041 | 177 | 208 | 174 | 60 | 55 | 32 | 8 | 3 |
| other_purse_seines | 55,792 | 1,219 | 69 | 82 | 69 | 24 | 22 | 13 | 3 | 1 |
| fish_factory | 49,560 | 860 | 61 | 73 | 46 | 21 | 19 | 10 | 3 | 1 |
| trollers | 48,789 | 1,090 | 61 | 79 | 66 | 22 | 20 | 12 | 3 | 1 |
| bunker_or_tanker | 14,965 | 74 | 18 | 14 | 5 | 5 | 5 | 2 | 1 | 0 |
| driftnets | 769 | 16 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
1.2.3 Spatial maps of emissions
Next we can look at spatial maps of emissions by pollutant, aggregated across all vessel types. These maps are shown at a spatial resolution of 0.1x0.1 degrees (Figure 1.14 - Figure 1.20).
Reading layer `World_Countries_Generalized' from data source
`/Users/gmcdonald/github/ocean-ghg/data/raw/World_Countries_Generalized_Shapefile/World_Countries_Generalized.shp'
using driver `ESRI Shapefile'
Simple feature collection with 251 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -20037510 ymin: -30240970 xmax: 20037510 ymax: 18418390
Projected CRS: WGS 84 / Pseudo-Mercator
1.3 Model Validation
The initial stages of model development involved multiple rounds of preliminary validation against testing model versions to identify the best-performing one among those evaluated. Following this testing phase, and with the AIS-based model built as described above, we validated it against measured absolute emission values from a dataset of a European Union (EU) monitoring program.
1.3.1 EU emissions data
We used the \(CO_2\) emissions data from maritime transport provided by the European Maritime Safety Agency, which is part of the monitoring, reporting, and verification program of carbon emissions from maritime transport, set by the Regulation (EU) 2015/757. As detailed below, vessels with certain characteristics and certain trips made by such vessels operating in EEA sea ports, must report their emissions on an annual basis.
1.3.1.1 What’s in an out of EU 2015/757
Here, we summarize the relevant provisions of this legislation that define the vessels and trip characteristics that need to be filtered from our data for validation purposes.
Vessel characteristics:
Ships above 5000 gross tonnage.
Excludes warships, naval auxiliaries, fish-catching or fish-processing ships, wooden ships of a primitive build, ships not propelled by mechanical means, or government ships used for non-commercial purposes.
Trips characteristics:
Trips from their last port of call to a port of call under the jurisdiction of a Member State and from a port of call under the jurisdiction of a Member State to their next port of call, as well as within ports of call under the jurisdiction of a Member State.
Vessel activities covered by the law are those when the ships are at sea as well as at berth.
Ship at berth includes any ship which is securely moored or anchored in a port falling under the jurisdiction of a Member State while it is loading, unloading or hotelling, including the time spent when not engaged in cargo operations.
Port of call means the port where a ship stops to load or unload cargo or to embark or disembark passengers; consequently, stops for the sole purposes of refueling, obtaining supplies, relieving the crew, going into dry-dock or making repairs to the ship and/or its equipment, stops in port because the ship is in need of assistance or in distress, ship-to-ship transfers carried out outside ports, and stops for the sole purpose of taking shelter from adverse weather or rendered necessary by search and rescue activities are excluded.
Ship to ship transfers carried out outside ports are covered by the Regulation when these transfers take place as part of a voyage starting and/or ending with a port of call under the jurisdiction of a Member State.
A ship to ship transfer carried out outside ports is not considered as a port of call. As a consequence, if for example, a vessel leaves an EEA port, arrives to a US harbor performs a ship to ship operation outside the port limits, and then goes to South Corea the emissions falling within MRV scope will be the emissions released during the whole voyage from the EEA port of call until the port of call in South Korea. However, if the ship to ship transfer can be carried out within the port limits, that operation would constitute a port of call. The voyage covered by the MRV Maritime Regulation would then be an EEA port of call to a US port of call.
If a ship performs more than 300 voyages during the reporting period and all of these voyages either start from or end in a port within a Member State, the company can be exempted from monitoring the detailed parameters for each voyage.
Time ranges:
- The reporting period covers one calendar year during which CO2 emissions have to be monitored. For voyages starting and ending in two different calendar years, the monitoring and reporting data shall be accounted under the first calendar year concerned.
1.3.2 Data filtering
Given the law’s exceptions, in order to validate emission estimates, we need to filter our results to match the EU data contents and aggregation. In this regard, the vessel characteristics in terms of type and gross tonnage can be filtered by simply assessing which vessel IDs are included in the EU dataset. As for the time range, since the data is aggregated yearly by the starting date of a trip, the filtering is straightforward.
The most challenging aspect of this data filtering corresponds to the trip characteristics, specifically when defining the “ports of call.” The main obstacle is the inability to determine the type of activity a vessel is engaged in while in port, which prevents us from identifying which trips are included within those aggregated yearly emissions. To work around this, and establish which vessels’ emissions can be included in the validation, we have compared the total distance and total hours at sea reported in the EU data to the total distance traveled across all trips in our trip-level emissions for that vessel and year. If these numbers were within ±5%, ±10%, or ±15% difference, then we could reasonably assumed that aggregated trips in the GFW dataset correspond to trips included in the EU dataset.
The EU data has been compiled and stored in proj_ocean_ghg.eu_validation_data, while the selection and part of the filtering from our data has been conducted through the queries in eu_validation_trip.sql and eu_validation_port.sql, differentiating between emissions by trip and port visits, and stored in proj_ocean_ghg.eu_validation_trip and proj_ocean_ghg.eu_validation_port respectively.
1.3.3 Validation results
Emissions estimates are divided between emissions at sea and emissions at port. Here, we present the differences between our results and the EU emissions data.
1.3.3.1 Trip emissions
As detailed in the Methods section, we have defined trip emissions as those occurring between ports. In the EU dataset, these emissions are categorized under three variables: emissions from EEA-EEA, EEA-NonEEA, and NonEEA-EEA seaports. We have run our model, selected the data by trips involving at least one EEA port, aggregated the results by year, and selected vessels listed in the EU dataset for a specific year.
After discarding 1077 observations with no time at sea, and assessing potential duplicates due to different ssvid for the same imo_number (0 duplicates in this dataset), we ended up with a total of 217103 observations—year and vessel—selected from our emission estimates. This is 87.7% of the EU data observations. By applying different margin values to the annual hours at sea we get the following performance metrics detailed in Table 1.13. As we can see, higher performance is achieved applying a 10% margin, with which we obtained a selection of 1599 observations for validation.
| mae | rmse | nrmse | rsq | rsq_trad | mape | mpe | threshold | n_observations |
|---|---|---|---|---|---|---|---|---|
| 2231.621 | 5068.543 | 0.456 | 0.801 | 0.792 | 29.600 | -3.216 | 0.05 | 926 |
| 2353.508 | 5099.621 | 0.454 | 0.798 | 0.794 | 29.948 | -3.458 | 0.10 | 1599 |
| 2410.567 | 5028.879 | 0.441 | 0.807 | 0.805 | Inf | -Inf | 0.15 | 2277 |
| 2579.295 | 5612.569 | 0.501 | 0.766 | 0.749 | Inf | -Inf | 0.20 | 2919 |
| 2735.795 | 5854.677 | 0.515 | 0.760 | 0.734 | Inf | -Inf | 0.25 | 3621 |
| 3103.941 | 7982.631 | 0.690 | 0.655 | 0.523 | Inf | -Inf | 0.30 | 4396 |
| 3386.662 | 8373.700 | 0.724 | 0.670 | 0.476 | Inf | -Inf | 0.35 | 5289 |
| 3862.253 | 9200.055 | 0.775 | 0.681 | 0.399 | Inf | -Inf | 0.40 | 6341 |
Comparing against the EU dataset with that 10% margin, the model demonstrates a good fit with an R-squared value of 0.8. While the MAE and RMSE indicate a considerable average magnitude of prediction errors, we need to consider that the average emission values are quite large since we are estimating absolute emissions per year. In fact, the normalized RMSE suggests that the errors, relative to the range of observed values, are moderate. Further, the model exhibits a tendency to underestimate emissions, as evidenced by the MPE. Despite some limitations, this validation framework offers a useful alternative to the one derived from the ML results, avoiding the related outlier inconveniences from emissions expressed in distance units. We can visually explore the relationship between our results and the EU data, observing certain linearity between both emission values with some spreading as the emission values increase (Figure 1.24).
We can double-check these results by applying the data filtering margin on distance traveled instead of hours. In this regard, while the EU dataset does not contain information on the total nautical miles (nm) navigated, we can extract this from the annual average \(CO_2\) emissions per distance. If we apply distinct margins to the total nautical miles (nm) navigated, we achieve the highest performance values for the 25% margin, with which we select 3551 observations. Overall, the performance metrics are slightly worse than those obtained by using hours at sea, which was a direct measure available in the EU dataset, rather than an indirect estimate like total nm navigated (Figure 1.25 and Table 1.14).
| mae | rmse | nrmse | rsq | rsq_trad | mape | mpe | threshold | n_observations |
|---|---|---|---|---|---|---|---|---|
| 1788.215 | 3866.017 | 0.360 | 0.872 | 0.870 | 23.392 | 1.76 | 0.05 | 1090 |
| 1811.906 | 3707.641 | 0.347 | 0.881 | 0.879 | Inf | -Inf | 0.10 | 1738 |
| 1933.623 | 3964.672 | 0.346 | 0.881 | 0.880 | Inf | -Inf | 0.15 | 2346 |
| 2111.669 | 7541.384 | 0.660 | 0.654 | 0.564 | Inf | -Inf | 0.20 | 2934 |
| 2260.113 | 7599.934 | 0.656 | 0.675 | 0.570 | Inf | -Inf | 0.25 | 3551 |
| 2435.037 | 7424.169 | 0.645 | 0.696 | 0.584 | Inf | -Inf | 0.30 | 4286 |
| 2701.152 | 7510.823 | 0.655 | 0.707 | 0.571 | Inf | -Inf | 0.35 | 5124 |
| 3087.718 | 7947.216 | 0.677 | 0.724 | 0.542 | Inf | -Inf | 0.40 | 6130 |
1.3.3.2 Port emissions
As for emissions at ports, these consist of emissions during port stays. Here, we have filtered those emission results by port visits within the EEA and followed the same aggregation procedure as for trips. However, some assumptions have been made due to the impossibility of filtering out stays that may not be included in the EU dataset. As mentioned earlier, one of the inconveniences related to using this dataset is the definition of “ports of call”, which establishes whether a vessel trip and port visit is considered or not under the regulation and, by extension, whether its emissions are reported and available in the EU dataset. Unable to define the activities performed by a vessel in a port, we assume that we will be overestimating the emissions at port since the EU does not include them all. With it, and applying the same procedure, we see that the performance values are quite poor given that overestimation, leaving us with the need to find an alternative way to validate our port emission results while improving our models estimates (Table 1.15 and Figure 1.26).
| model | mae | rmse | nrmse | rsq | rsq_trad | mape | mpe |
|---|---|---|---|---|---|---|---|
| Model 6.3 | 7369.646 | 17146.17 | 12.431 | 0.137 | -153.529 | Inf | -Inf |
By applying an additional filtering step, selecting those observations from the trips under the 10% hours at sea margin approach, we would expect to narrow down our observations to those actually happening under “ports of call”. However, as the performance values in Table 1.16 and the emissions relationship shown in Figure 1.27 indicate, our model still does not capture emissions at port as reflected in the EU dataset, highlighting the need for improvement in this aspect.
| model | mae | rmse | nrmse | rsq | rsq_trad | mape | mpe |
|---|---|---|---|---|---|---|---|
| Model 6.3 | 1382.316 | 6236.424 | 4.196 | 0.079 | -16.616 | Inf | -Inf |
1.3.3.2.1 2 vs 4 operational phases
As described in the methods section, we tested the inclusion of 2 operational phases and 4 operational phases against the entire GFW dataset. The inconvenience of the first approach was the oversimplification of such phases—with different energy demands—when vessel movement is minimal, presumably failing to quantify emissions accurately, especially when vessels are at port. Additionally, the inclusion of all 4 operational phases incorporated the emissions from boilers, potentially reducing the underestimation of emissions from the 2 operational phases model.
The validation results for both approaches showed a slightly better performance for trip emissions (0.02 superior in R2) for the 2 operational phases model, although the 4 operational phases model underestimated the actual emissions less. As for port emissions, as expected, they considerably improved under the 4 operational phases approach, despite still presenting low performance. Therefore, while the inclusion of 4 operational phases allows us to closely follow the ICCT/IMO methodology, the validation results also show a clear advantage of implementing such refinement, with a slight drawback in trip emissions performance. However, we must consider that the inclusion of these additional phases comes with several assumptions in vessel class grouping, giving room for additional improvements. Better vessel class resolution in the GFW datasets could potentially translate to improved performance. Furthermore, alternative validation datasets, yet to be defined, could help establish clearer confidence intervals.
1.3.4 Comparison to other global emissions estimates
Several other emissions estimates have been done by the International Maritime Organization (IMO) (most recently for 2018), Emissions Database for Global Atmospheric Research (EDGAR) in (most recently for 2023), Organization for Economic Co-operation and Development (OECD) (most recently for 2024 using their experimental database). Below we compare our GHG emissions estimates with the findings of these studies to validate our results
The emissions inventories used in this comparison rely on a range of methods and have some differences in sector definitions and underlying data sources. Here we compare the most recent year of data from each inventory.
Highlights include:
GFW’s total estimated emissions (1.35 billion MT CO2 for both broadcasting vessels and non-broadcasting vessels, and 1.17 billion MT CO2 for just AIS-broadcasting vessels) are similar to OECD, EDGAR, and IMO, indicating high reliability and utility of our methodology and results.
IMO’s, OECD’s, and EDGAR’s emissions estimates do not include “dark” fleets, making GFW’s dark fleet estimates the first of their kind.
| CO2 emissions (billion MT) | Data source |
|---|---|
| 1.35 | GFW (2024, AIS + S1) |
| 1.17 | GFW (2024, AIS) |
| 0.97 | OECD (2024) |
| 0.91 | EDGAR (2023) |
| 1.06 | IMO (2018) |