Skip to content

Google Drive Source

The Google Drive source watches your Drive for file changes and emits structured events when files are created, moved, trashed, shared, removed, or updated. It uses Google's changes API with a stored cursor for efficient incremental sync.

A key feature is debounced updates: when a document is actively edited, the pipeline can wait for a quiet period before emitting a single file_updated event instead of flooding your sinks with every intermediate save. This is achieved via the centralized Coalescing system.

Getting Started

1. Authorize access

Generate a Google OAuth token with the drive scope using the Google Auth CLI:

bash
inboxclaw google auth \
  --credentials-file data/credentials.json \
  --scopes drive \
  --token data/google_token.json

The drive scope grants read-only access to file content, which is needed for text diffs in file_updated events. If you only need metadata tracking (names, moves, shares) without text diffs, you can use the drive_metadata scope instead and set eligible_mime_types_for_content_diff: [] in your config.

2. Add the source to config.yaml

yaml
sources:
  my_drive:
    type: google_drive
    token_file: "data/google_token.json"
    poll_interval: "30s"
    coalesce:
      - match: ["google.drive.file_updated", "google.drive.file_moved"]
        strategy: "debounce"
        window: "60s"

3. Initial sync (bootstrapping)

On the first run, the source needs to learn about your existing files so it doesn't report them all as "newly created" when they're next modified. The bootstrap_mode setting controls this:

  • baseline_only (default): Quick crawl of your Drive to record current file state. No events emitted. Future changes are compared against this baseline.
  • full_snapshot: Like baseline_only, but also fetches and caches text content of documents. This allows the very first file_updated event to include a text diff. Slower and uses more API quota.
  • off: No initial crawl. All existing files will emit a file_created event the first time they are modified after the source starts.

Core Concepts

Change Classification

When a file change is detected, the source compares the new metadata against its cached snapshot and classifies the change:

  • Immediate events: file_created, file_moved, file_trashed, file_untrashed, file_shared_with_you, file_removed — emitted right away.
  • file_updated: This event represents metadata or content changes. Because files are often edited in bursts, it is highly recommended to use Coalescing (Debounce) for this event type to avoid noise.

Coalescing (Debounce)

By default, the Google Drive source emits events as they are detected. To avoid a flood of file_updated events during an active editing session, you should configure a coalesce rule in your config.yaml:

yaml
sources:
  my_drive:
    type: google_drive
    coalesce:
      - match: ["google.drive.file_updated", "google.drive.file_moved"]
        strategy: "debounce"
        window: "60s"

This configuration ensures that if you save a file multiple times within 60 seconds, you only receive one final event after 60 seconds of silence.

Configuration

Minimal Configuration

yaml
sources:
  my_drive:
    type: google_drive
    token_file: "data/google_token.json"
    poll_interval: "30s"

Full Configuration

yaml
sources:
  my_drive:
    type: google_drive
    token_file: "data/google_token.json"
    poll_interval: "30s"
    bootstrap_mode: "baseline_only"
    restrict_to_my_drive: false
    include_removed: true
    include_corpus_removals: false
    eligible_mime_types_for_content_diff:
      - "application/vnd.google-apps.document"
      - "text/plain"
      - "text/markdown"
      - "text/html"
    max_diffable_file_bytes: 10485760
    coalesce:
      - match: ["google.drive.file_updated", "google.drive.file_moved"]
        strategy: "debounce"
        window: "60s"

Configuration Reference

ParameterTypeDefaultDescription
token_filestringRequiredPath to the Google OAuth2 token file.
poll_intervalstring"10m"How often to check for changes. Supports human-readable intervals (e.g. "30s", "5m").
bootstrap_modestring"baseline_only"Initial sync behavior: baseline_only, full_snapshot, or off.
restrict_to_my_driveboolfalsetrue limits scope to My Drive only. false allows wider visibility.
include_removedbooltrueInclude removal entries from the Drive changes feed.
include_corpus_removalsboolfalseRequest corpus-removal details when available.
eligible_mime_types_for_content_difflistGoogle Docs, text/* typesMIME types eligible for paragraph-level text diffing.
max_diffable_file_bytesint10485760 (10 MB)Size limit for content fetching and diffing.
coalescelist[]List of Coalescing Rules (e.g., for google.drive.file_updated).

Event Definitions

TypeEntity IDDescription
google.drive.file_createdDrive file IDFile first seen in local snapshot cache.
google.drive.file_movedDrive file IDParent folder changed.
google.drive.file_trashedDrive file IDFile was moved to trash.
google.drive.file_untrashedDrive file IDFile was restored from trash.
google.drive.file_shared_with_youDrive file IDA file was shared with you (non-owned file).
google.drive.file_removedDrive file IDFile was removed from the changes feed (change.removed=true).
google.drive.file_updatedDrive file IDDebounced update after content or metadata change.

google.drive.file_deleted and google.drive.file_permission_changed are intentionally not emitted in the current version.

Event Examples

google.drive.file_created

json
{
  "id": 1,
  "event_id": "drive-1AbCd-file_created-1741999501",
  "event_type": "google.drive.file_created",
  "entity_id": "1AbCd",
  "created_at": "2026-03-15T00:45:01+00:00",
  "data": {
    "fileId": "1AbCd",
    "name": "Q1 plan",
    "mimeType": "application/vnd.google-apps.document",
    "parentIds": ["0AExampleFolder"],
    "owners": [
      {
        "displayName": "Alice",
        "emailAddress": "alice@example.com"
      }
    ],
    "createdTime": "2026-03-15T00:40:10Z",
    "description": "Quarterly roadmap",
    "lastModifyingUser": {
      "displayName": "Alice",
      "emailAddress": "alice@example.com"
    },
    "webViewLink": "https://docs.google.com/document/d/1AbCd/edit?usp=drivesdk",
    "size": "12345"
  },
  "meta": {}
}

google.drive.file_updated (debounced)

json
{
  "id": 2,
  "event_id": "drive-1AbCd-google.drive.file_updated-27",
  "event_type": "google.drive.file_updated",
  "entity_id": "1AbCd",
  "created_at": "2026-03-15T00:48:10+00:00",
  "data": {
    "fileId": "1AbCd",
    "name": "Q1 plan",
    "mimeType": "application/vnd.google-apps.document",
    "parentIds": {
      "before": ["0AExampleFolder"],
      "after": ["0AExampleFolder"]
    },
    "session": {
      "sessionStartedAt": "2026-03-15T00:46:12Z",
      "lastChangeSeenAt": "2026-03-15T00:47:56Z",
      "rawChangeCount": 4,
      "changes": [
        {
          "before": "Old paragraph content...",
          "after": "New paragraph content..."
        }
      ],
      "totalChangedSections": 1,
      "addedCharCount": 15,
      "removedCharCount": 10
    },
    "lastModifyingUser": {
      "displayName": "Alice",
      "emailAddress": "alice@example.com"
    },
    "webViewLink": "https://docs.google.com/document/d/1AbCd/edit?usp=drivesdk",
    "size": "12345"
  },
  "meta": {}
}

For text files with eligible MIME types, file_updated includes diff fields under the session object: changes (array of change objects), totalChangedSections, addedCharCount, removedCharCount.

google.drive.file_moved

json
{
  "id": 3,
  "event_id": "drive-1AbCd-google.drive.file_moved-2026-03-15T00:50:01Z",
  "event_type": "google.drive.file_moved",
  "entity_id": "1AbCd",
  "created_at": "2026-03-15T00:50:01+00:00",
  "data": {
    "fileId": "1AbCd",
    "name": "Q1 plan",
    "mimeType": "application/vnd.google-apps.document",
    "parentIds": {
      "before": ["0AExampleFolder"],
      "after": ["0ANewFolder"]
    },
    "owners": [
      {
        "displayName": "Alice",
        "emailAddress": "alice@example.com"
      }
    ],
    "lastModifyingUser": {
      "displayName": "Alice",
      "emailAddress": "alice@example.com"
    },
    "webViewLink": "https://docs.google.com/document/d/1AbCd/edit?usp=drivesdk",
    "size": "12345"
  },
  "meta": {}
}

google.drive.file_removed

json
{
  "id": 4,
  "event_id": "drive-1AbCd-google.drive.file_removed-2026-03-15T00:51:22Z",
  "event_type": "google.drive.file_removed",
  "entity_id": "1AbCd",
  "created_at": "2026-03-15T00:51:22+00:00",
  "data": {
    "fileId": "1AbCd",
    "lastKnownName": "Q1 plan",
    "lastKnownMimeType": "application/vnd.google-apps.document",
    "lastKnownParentIds": ["0AExampleFolder"]
  },
  "meta": {}
}

google.drive.file_shared_with_you

json
{
  "id": 5,
  "event_id": "drive-7XyZa-google.drive.file_shared_with_you-2026-03-15T00:52:09Z",
  "event_type": "google.drive.file_shared_with_you",
  "entity_id": "7XyZa",
  "created_at": "2026-03-15T00:52:09+00:00",
  "data": {
    "fileId": "7XyZa",
    "name": "Vendor Contract",
    "mimeType": "application/pdf",
    "owners": [
      {
        "displayName": "Bob",
        "emailAddress": "bob@example.com"
      }
    ],
    "sharedWithMeTime": "2026-03-15T00:52:00Z",
    "sharingUser": {
      "displayName": "Alice",
      "emailAddress": "alice@example.com"
    }
  },
  "meta": {}
}

Data Payload Reference

Common fields across all event types: fileId, name, mimeType, owners.

Event typeAdditional fields
file_createdparentIds, createdTime, description, lastModifyingUser, webViewLink, size
file_movedparentIds: { before, after }, owners, lastModifyingUser, webViewLink, size
file_trashedtrashedBefore, trashedAfter, owners, lastModifyingUser, webViewLink, size
file_untrashedtrashedBefore, trashedAfter, owners, lastModifyingUser, webViewLink, size
file_shared_with_yousharedWithMeTime, sharingUser, owners
file_removedlastKnownName, lastKnownMimeType, lastKnownParentIds
file_updateddescription, lastModifyingUser, webViewLink, size, session: { ... }