4.3 Example: Calculate Fishing Hours

In this example we’ll query fishing hours for January 2020 for all active drifting longline fishing vessels. We’ll write the query as a string in R, download the results, and create a map of fishing hours. The example is broken down into three sections: 1) Setup, 2) Query data and 3) Map Fishing Effort.

4.3.1 Setup

Load packages and setup the BigQuery connection. In this example, we’re using bigrquery, a package designed to interface with BigQuery in R. To authorize, we simply call the bq_auth() command it should open a window that allows you to authenticate yourself. If you have already authorized your Google account in R, the authorization should pull up a message allowing you to select a pre-authorized account. Your credentials should typically be your UCSB email address and password.

4.3.2 Query Data

Write the query to get fishing hours for all drifting longliners in January 2020 binned at 0.1 degrees. While the query below can be written into a larger nested query, it’s recommended to use subqueries, particularly when starting out in SQL since they are easier to check and debug. Since we are using tables from both the world-fishing-827 project and the emlab-gcp project, it’s really important to identify the project when calling the datasets in our query (i.e. starting with world-fishing-827. or emlab-gcp.). The query to calculate fishing hours is composed of several subqueries:

  1. Identify segments to use in the analysis that pass the recommended noise filters (a good segment with more than 10 positions per segment that is not overlapping and short). We’ll be using the testing table, emlab-gcp.emlab_test.pipe_v20201001_subset2020

  2. Create a list of vessels of interest. In this case we will use vessels with a best vessel class of drifting_longlines in 2020 that are on GFW’s best fishing vessel table (world-fishing-827.fishing_vessels_ssvid_v20210706).

  3. Gather AIS positions for January 2020 only including good segments. We’ll be using the testing table, emlab-gcp.emlab_test.pipe_v20201001_fishing_subset2020.

  4. Filter AIS positions for our vessels of interest

  5. Calculate fishing hours. Fishing hours are calculated using hours and the nnet_score. When the nnet_score = 1 the neural net thinks this is a fishing position so we assign the hours as fishing hours.

  6. Aggregate fishing hours. We’ll bin fishing hours at 0.1 degree resolution and aggregate across grids. This gives us a total estimate of fishing hours for each cell by all drifting longline vessels in January 2020. Note: by including the ssvid and/or date in the GROUP BY statement you can aggregate fishing hours per grid cell by vessel and date.

sql_fishing_hours <- "#StandardSQL

WITH

  ##################################
  # Identify good segments using
  # GFW's recommended noise filters
  
  good_segments AS (
  SELECT
    seg_id
  FROM
    `emlab-gcp.emlab_test.pipe_v20201001_segs_subset2020`  
  WHERE 
    good_seg
  AND positions > 10 
  AND NOT overlapping_and_short 
  ),

  ##################################
  # List of drifting longline fishing
  # vessels in 2020 

  longline_vessels_2020 AS (
  SELECT 
    ssvid,
    year
  FROM 
    `world-fishing-827.gfw_research.fishing_vessels_ssvid_v20210706`
  WHERE 
    best_vessel_class = 'drifting_longlines'
    AND year = 2020
  ),

  ##################################
  # Fishing positions for January 
  # 2020 from the emlab test table 

  fishing_positions AS (
  SELECT 
    ssvid,
    lat,
    lon,
    EXTRACT(date FROM timestamp) AS date,
    EXTRACT(year FROM timestamp) AS year,
    hours,
    nnet_score
  FROM 
    `emlab-gcp.emlab_test.pipe_v20201001_fishing_jan2020`
  WHERE 
  # Keep only good segments 
  seg_id IN(
    SELECT 
      seg_id
    FROM 
      good_segments
   )
  ),

  ##################################
  # Filter fishing positions for 
  # only our vessels of interest

  fishing_filtered AS (
  SELECT 
    *
  FROM 
    fishing_positions
  JOIN longline_vessels_2020
  # Only keep positions for fishing vessels active that year
  USING(ssvid, year)
  ),

  ##################################
  # Calculating fishing hours for 
  # each position 

  calc_fishing_hours AS (
  SELECT 
    *,
    IF(nnet_score = 1, hours, 0) As fishing_hours
  FROM 
    fishing_filtered
  ),

  ##################################
  # Aggregate fishing hours by
  # grid cell 

  fishing_binned AS (
  SELECT
    # Convert lat/lon to 0.1 degree bins
    FLOOR(lat * 10) / 10  AS lat_bin,
    FLOOR(lon * 10) / 10 AS lon_bin,
    SUM(hours) as hours,
    SUM(fishing_hours) as fishing_hours
  FROM 
    calc_fishing_hours
  GROUP BY lat_bin, lon_bin
  )

  SELECT * FROM fishing_binned"

Run the query and write the results as a table (test_fishing_hours_jan2020_tenthdegree) in BigQuery. This way the query doesn’t have to be re-run every time the SOP gets updated. Instead, we just download the table we created into the working environment. Using the BigQuery console to validate the query it will bill ~12 GB (< $1).

4.3.3 Map Fishing Effort

We’ll use the data from the table we just created to make a global map of fishing effort in January 2020 for all drifting longline vessels. Fishing hours range from just above 0 to just over 500 although most fishing hours values are below 100. Fishing hours can be displayed in a number of ways including total fishing hours, total fishing hours per area (\(km^2\)) or log transformed fishing hours. In this example, we’ll log transform the fishing hours. The colors used to produce the base map were taken from GFW color palletes which are part of the fishwatchr package. More inforamtion on installing and using the fishwatchr package can be found in the package repository.