BigQuery Data Source
The BigQuery Data Source tool enables your agents to query Google BigQuery data warehouses using natural language. Your agents can ask questions in plain English, and the tool automatically converts them to BigQuery SQL, executes the query, and returns structured results.Currently Supported: This tool currently supports Google BigQuery only. Other data sources may be added in future releases.
Overview
The BigQuery Data Source tool combines the power of Google BigQuery’s analytics capabilities with GPT-4’s natural language understanding to create a seamless data access experience for your agents.Natural Language Queries
Ask questions in plain English - no SQL knowledge required
GPT-4 Powered
Automatic conversion to BigQuery SQL using advanced AI
Real-Time Data
Query live data directly from your BigQuery datasets
Secure Access
Enterprise-grade security with Google Cloud authentication
Key Features
- Natural Language to SQL: Converts plain English questions to BigQuery Standard SQL
- Automatic Schema Detection: Understands your table structures automatically
- Token Usage Tracking: Monitors OpenAI API costs for each query
- Result Limiting: Controls query size and response volume
- Usage Analytics: Tracks query performance and usage patterns
How It Works
- Your agent receives a natural language question from a user
- The question is sent to GPT-4 (
gpt-4.1-2025-04-14) along with your BigQuery schema - GPT-4 generates optimized BigQuery Standard SQL
- The SQL query is executed on your BigQuery dataset
- Results are formatted as JSON and returned to the agent
- Token usage is tracked for billing and monitoring
Prerequisites
Before configuring the BigQuery Data Source tool, ensure you have:Setting Up Google Cloud Service Account
-
Create Service Account:
- Go to GCP Console → IAM & Admin → Service Accounts
- Click “Create Service Account”
- Name it (e.g., “plai-bigquery-reader”)
- Click “Create and Continue”
-
Grant BigQuery Permissions:
- Add role:
BigQuery Data Viewer(for read-only access) - Optionally add:
BigQuery Job User(to run queries) - Click “Continue” then “Done”
- Add role:
-
Create and Download Key:
- Click on the created service account
- Go to “Keys” tab → “Add Key” → “Create new key”
- Select “JSON” format
- Download and securely store the JSON file
Configuration
Configuration Structure
The tool requires a specific configuration structure with your Google Cloud credentials and BigQuery dataset information:Configuration Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
external_datasource_type | string | Yes | - | Must be "BIG_QUERY" (only supported type) |
service_account_credentials | object | Yes | - | Complete Google Cloud service account JSON credentials |
dataset_id | string | Yes | - | BigQuery dataset ID containing your tables |
table_ids | string[] | Yes | - | List of table names the agent can access and query |
system_prompt | string | null | No | null | Optional custom instructions for SQL generation |
query_max_rows | integer | No | 10000 | Maximum rows to retrieve from BigQuery |
output_max_rows | integer | No | 10 | Maximum rows to return to the agent (limits response size) |
Configuration Through PLai Interface
The tool configuration wizard guides you through a 4-step process:Step 1: Settings
Configure basic tool settings and query parameters:Required Fields:
- Name: Descriptive name for your BigQuery tool (e.g., “Company Analytics Database”)
- Slug: Auto-generated lowercase identifier with underscores (e.g., “company_analytics_database”)
- Description: Explain what data this tool provides access to
- System Prompt: Custom instructions for SQL generation (supports Markdown editor for advanced formatting)
- Query Max Rows: Maximum rows to retrieve from BigQuery (default: 10000)
- Output Max Rows: Maximum rows to return to agent (default: 10)
Step 2: Credentials
Paste your Google Cloud service account credentials:Google Cloud Service Account Credentials:Click Next to continue to Datasets (the system will validate credentials automatically).
- Copy the entire JSON credentials file from your downloaded service account key
- Paste into the text area labeled “Paste your Google Cloud service account credentials JSON here”
- The JSON must include all required fields:
type,project_id,private_key,client_email, etc.
Step 3: Datasets
Select the BigQuery dataset to query:Select Dataset:Click Next to continue to Tables.
- The wizard automatically detects available datasets from your service account credentials
- You’ll see a list of datasets with radio buttons to select one
- Each dataset shows its ID and the associated project name below
- Choose the dataset containing the tables you want to make available to your agent
You can only select one dataset per tool. To query multiple datasets, create separate tools for each dataset.
Step 4: Tables
Select which tables the agent can access:Select Tables:
- Check the boxes next to tables you want to make available to the agent
- All tables display “Type: TABLE” to indicate they are queryable
- Multiple tables can be selected
- The agent will only be able to query the selected tables
- Table schemas are automatically detected and used for SQL generation context
Natural Language Queries
How Query Conversion Works
The tool uses OpenAI’s GPT-4 to convert natural language questions into BigQuery Standard SQL. This happens automatically and transparently: Model Details:- Model:
gpt-4.1-2025-04-14 - Temperature:
0.1(low temperature for consistent, deterministic SQL) - Context Provided: BigQuery schema, table definitions, example queries, and constraints
- User asks a question in natural language
- Tool sends question + schema context to GPT-4
- GPT-4 generates BigQuery-compatible SQL
- SQL is executed against your dataset
- Results are formatted and returned
- Token usage is recorded for billing
Example Conversions
- Simple Queries
- Customer Analysis
- Trend Analysis
- Complex Analysis
Natural Language: “How many orders were placed yesterday?”Generated SQL:
Natural Language: “Show me the total revenue for last month”Generated SQL:
Natural Language: “Show me the total revenue for last month”Generated SQL:
Tips for Effective Queries
Be Specific
Be Specific
Good: “Show total revenue by product category for Q4 2024”Less Good: “Show me some revenue data”Specific queries help GPT-4 generate more accurate SQL with appropriate filters and groupings.
Mention Time Ranges
Mention Time Ranges
Good: “How many new customers signed up last week?”Less Good: “How many new customers?”Including time ranges prevents queries from scanning unnecessary data and improves performance.
Use Natural Language
Use Natural Language
Good: “Which products have the highest return rate?”Less Good: “SELECT product, returns FROM…”Let GPT-4 write the SQL - describe what you want, not how to get it.
Reference Your Tables
Reference Your Tables
Good: “Show me top customers from the customers and orders tables”Less Good: “Show me top customers”Mentioning table names helps GPT-4 understand which tables to query.
BigQuery SQL Examples
While natural language queries are the primary interface, understanding BigQuery SQL syntax can help you configure custom system prompts and troubleshoot issues.Basic Queries
Aggregations and Grouping
Time Series Analysis
Window Functions
BigQuery-Specific Features
- STRUCT Types
- ARRAY Aggregation
- Date Functions
Usage & Best Practices
Query Optimization
Use Time Filters
Always include date filters to limit data scanned and reduce costs
Limit Result Sets
Use LIMIT clauses and configure appropriate
query_max_rowsSelect Specific Columns
Avoid SELECT * - only query columns you need
Partition Filtering
Filter on partitioned columns (usually date fields) first
Cost Management
OpenAI Costs:- ~$0.005-0.030 per query for GPT-4 token usage
- Varies based on query complexity and schema size
- Token usage is tracked and returned in results
- Charged based on data scanned (per TB)
- Typical query: $0.001-0.050 depending on data volume
- Use partitioned tables and clustering to reduce costs
- Preview queries in BigQuery Console to estimate costs
Security Best Practices
Service Account Permissions
Service Account Permissions
Recommended Roles:
roles/bigquery.dataViewer- Read-only access to dataroles/bigquery.jobUser- Ability to run queries
roles/bigquery.admin- Too broad for agent use- Write permissions - Unless specifically required
Credential Management
Credential Management
DO:
- Store credentials encrypted in PLai’s secure storage
- Rotate service account keys every 90 days
- Use separate service accounts per environment (dev/staging/prod)
- Monitor service account usage in GCP Console
- Commit credentials to version control
- Share service account keys via email/chat
- Use personal GCP accounts for service accounts
- Grant excessive permissions “just in case”
Data Access Controls
Data Access Controls
Table-Level Security:
- Only include necessary tables in
table_idsconfiguration - Use BigQuery authorized views for sensitive data
- Implement row-level security in BigQuery if needed
- Create separate datasets for different security levels
- Use different service accounts for different datasets
- Enable BigQuery audit logging to track access
Query Monitoring
Query Monitoring
Track Usage:
- Monitor query patterns in agent analytics
- Review token usage for cost anomalies
- Check BigQuery job history regularly
- Set up alerts for unusual query patterns
- All queries are logged with timestamps
- Token usage is recorded per query
- SQL statements are stored for review
- Failed queries are tracked with error details
Tool Response Structure
Response Format
When a query is executed, the tool returns a structured response containing:Example Response
Limitations
Current Limitations
Database Support:- ✅ Google BigQuery - Fully supported
- ❌ PostgreSQL - Not available
- ❌ MySQL - Not available
- ❌ MongoDB - Not available
- ❌ Other databases - Not available
- Maximum query result: 10,000 rows (default, configurable via
query_max_rows) - Maximum agent output: 10 rows (default, configurable via
output_max_rows) - Query timeout: Depends on BigQuery job limits (typically 6 hours max)
- No support for DML operations (INSERT, UPDATE, DELETE)
- No support for DDL operations (CREATE, ALTER, DROP)
- OpenAI API costs: ~$0.005-0.030 per query
- BigQuery costs: Based on data scanned (varies by query)
- No built-in cost limits or budgets
- Must monitor usage actively
- GPT-4 may occasionally generate suboptimal SQL
- Complex multi-table joins may require query refinement
- Schema context limited to configured tables only
- No support for dynamic table discovery
- Query latency: 2-10 seconds typical (GPT-4 + BigQuery execution)
- Not suitable for sub-second real-time responses
- Large result sets may impact performance
- Concurrent query limits apply
Troubleshooting
Common Issues
403 Forbidden: Access Denied
403 Forbidden: Access Denied
Symptoms: Permission errors when querying BigQueryPossible Causes:
- Service account lacks required BigQuery permissions
- Table not included in
table_idsconfiguration - Dataset doesn’t exist or is in different project
- Service account doesn’t have access to specific tables
-
Verify service account has
roles/bigquery.dataViewerrole: -
Add missing tables to
table_idsarray in configuration -
Check dataset exists and project ID matches credentials:
- Grant table-level access if using authorized views
Invalid SQL Syntax Errors
Invalid SQL Syntax Errors
Symptoms: Query fails with “Invalid SQL syntax” or “Unrecognized name”Possible Causes:
- GPT-4 generated BigQuery-incompatible SQL
- Table references missing backticks
- Using functions from other SQL dialects
- Column names don’t exist in schema
- Rephrase natural language query more clearly and specifically
- Mention specific table names in your question
- Check the generated SQL in the response and validate manually in BigQuery Console
-
Use custom
system_promptto guide SQL generation: - Review BigQuery Standard SQL documentation for correct syntax
Quota Exceeded Errors
Quota Exceeded Errors
Symptoms: “Quota exceeded” or “Rate limit exceeded” errorsPossible Causes:
- BigQuery daily query quota reached
- Too many concurrent queries
- Exceeded slots or bytes scanned limits
-
Check BigQuery quotas in GCP Console:
- Go to IAM & Admin → Quotas
- Filter for “BigQuery API”
- Review usage and limits
- Request quota increase from Google Cloud Support
- Implement query result caching to reduce repeated queries
- Reduce query frequency or batch queries together
- Consider upgrading to BigQuery flat-rate pricing for predictable costs
Invalid Service Account Credentials
Invalid Service Account Credentials
Symptoms: Authentication failures or “Invalid credentials” errorsPossible Causes:
- Malformed JSON credentials
- Expired or revoked service account key
- Missing required credential fields
- Service account deleted from GCP
- Download fresh service account key from GCP Console
- Validate JSON syntax using a JSON validator
-
Ensure all required fields are present:
type,project_id,private_key_id,private_keyclient_email,client_id,auth_uri,token_uri
-
Verify service account still exists:
- Create new service account if the old one was deleted
Empty or No Results
Empty or No Results
Symptoms: Query succeeds but returns no data or empty resultsPossible Causes:
- Query filters too restrictive (no matching data)
output_max_rowsset too low- Data doesn’t exist in specified tables
- WHERE clause conditions exclude all rows
- Test the generated SQL directly in BigQuery Console
- Remove filters temporarily to check if data exists
-
Increase
output_max_rowsif expecting more results -
Verify data exists in specified tables:
- Check date filters aren’t excluding all data
- Review WHERE clause conditions in generated SQL
Slow Query Performance
Slow Query Performance
Symptoms: Queries taking longer than expected (>10 seconds)Possible Causes:
- Large data scan (non-partitioned queries)
- Complex joins across multiple tables
- Missing indexes or clustering
- Inefficient SQL generated by GPT-4
- Add time-range filters to limit data scanned
- Use partitioned columns in WHERE clauses
- Review query execution plan in BigQuery Console
- Optimize table structure with partitioning and clustering
- Rephrase question to generate simpler SQL
- Consider materializing complex views for frequently-accessed data
Getting Help
If you continue to experience issues:- Check BigQuery Logs: Review query history in BigQuery Console
- Review Token Usage: High token usage may indicate schema issues
- Test Manually: Run generated SQL directly in BigQuery to isolate issues
- Check Service Account: Verify permissions and access in GCP IAM
- Contact Support: Provide query details, error messages, and service account setup
Security & Compliance
Data Protection
Google Cloud Security
Inherits all Google Cloud Platform security features including encryption at rest and in transit
Service Account Auth
Uses OAuth 2.0 service account authentication with secure credential storage
Query Auditing
All queries logged with timestamps, SQL statements, and token usage
Access Controls
BigQuery IAM permissions control data access at project, dataset, and table levels
Compliance Considerations
Compliance certifications depend on your Google Cloud Platform configuration and PLai infrastructure setup. Consult with your security and compliance teams to ensure requirements are met.
- GDPR: Ensure BigQuery data processing agreements are in place with Google Cloud
- HIPAA: Use HIPAA-compliant GCP projects if handling protected health information
- SOC 2: Verify both PLai and Google Cloud Platform SOC 2 compliance status
- Data Residency: Configure BigQuery datasets in appropriate geographic regions
- Audit Trails: Enable BigQuery audit logging for compliance reporting
Encryption
- In Transit: All data transfers use TLS 1.3 encryption
- At Rest: BigQuery data encrypted at rest by default using Google-managed keys
- Credentials: Service account private keys encrypted in PLai secure storage
- Query Results: Temporary query results encrypted in BigQuery