Guardrails Overview
Guardrails are intelligent safety and content filtering mechanisms in PLai Framework that protect both your agents and users by detecting, blocking, or masking inappropriate, sensitive, or harmful content in real-time. Powered by Amazon Bedrock Guardrails, they provide enterprise-grade AI safety and compliance controls.What are Guardrails?
Guardrails act as protective barriers that monitor and control content flowing through your AI agents. They operate at two critical points:INPUT Guardrails
Filter and validate user messages before they reach the AI model
OUTPUT Guardrails
Validate and filter AI-generated responses before delivery to users
How Guardrails Work
Guardrails inspect content in real-time using advanced detection models:Action Types
Guardrails can take different actions when they detect policy violations:- Block
- Mask (Anonymize)
Complete prevention of contentWhen sensitive content is detected, the guardrail completely blocks the message or response.Best for:
- Hate speech
- Explicit sexual content
- Violent or harmful content
- Prohibited topics (politics, religion)
- Policy violations
Key Features
1. INPUT and OUTPUT Protection
INPUT Guardrails
INPUT Guardrails
Protect your AI model from harmful inputsINPUT guardrails validate user messages before they reach the AI model:
- Filter malicious prompts: Block prompt injection and jailbreak attempts
- Mask PII: Remove sensitive user data before processing
- Block prohibited topics: Prevent queries about restricted subjects
- Validate content safety: Screen for harmful or inappropriate input
- User-facing chatbots
- Public-facing agents
- Compliance-sensitive applications
- Customer service bots
OUTPUT Guardrails
OUTPUT Guardrails
Ensure safe, compliant AI responsesOUTPUT guardrails validate AI-generated content before delivery to users:
- Prevent harmful outputs: Block toxic, violent, or inappropriate responses
- Protect sensitive information: Mask PII in AI-generated content
- Enforce brand safety: Ensure responses meet brand guidelines
- Maintain compliance: Verify regulatory compliance
- Public content generation
- Customer-facing communications
- Regulated industries
- Brand-sensitive applications
2. PII Detection and Masking
Guardrails can automatically detect and mask various types of personally identifiable information:Contact Information
- Email addresses
- Phone numbers
- Physical addresses
- Social media handles
Financial Data
- Credit card numbers
- Bank account numbers
- SSN/Tax IDs
- Financial statements
Personal Identifiers
- Full names
- Date of birth
- Driverโs license numbers
- Passport numbers
3. Default Guardrail
PLai Framework includes a default guardrail that automatically protects all agents from harmful INPUT content:Default INPUT Guardrail:Automatically blocks user queries containing:
- ๐ซ Sexual content: Explicit sexual material or inappropriate content
- ๐ซ Hate speech: Discrimination, prejudice, or hateful content
- ๐ซ Insults: Personal attacks or abusive language
- ๐ซ Politics: Political opinions, debates, or partisan content
- ๐ซ Religion: Religious debates or divisive religious content
4. On-Demand Creation
Guardrails are created on-demand based on your specific requirements:Guardrails are powered by Amazon Bedrock Guardrails, providing enterprise-grade content filtering backed by AWSโs advanced AI safety models.
Guardrail Scope
Guardrails can be configured at different organizational levels:- General Guardrails
- Organization-Specific Guardrails
Available to all organizations
- Platform-wide default guardrail
- Standard safety and compliance guardrails
- Common PII masking rules
- Industry-standard content filters
- Maintained by PLai Framework
- Automatically updated
- Best practices built-in
- No configuration required
Ideal for getting started quickly with proven safety measures
Use Cases
1. Customer Service & Support
Protect customer interactions and maintain professional communication standards.
- INPUT: Block offensive language and inappropriate requests
- INPUT: Mask customer PII (phone, email, account numbers)
- OUTPUT: Prevent sharing of internal system information
- OUTPUT: Ensure professional, brand-aligned language
2. Healthcare & Medical Applications
Guardrails to implement:- INPUT: Mask all PHI (patient names, diagnoses, records)
- INPUT: Block requests for medical advice or diagnoses
- OUTPUT: Prevent disclosure of patient information
- OUTPUT: Block medical recommendations outside scope
3. Financial Services
Guardrails to implement:- INPUT: Mask credit card numbers, account details, SSNs
- INPUT: Block fraudulent or suspicious requests
- OUTPUT: Prevent disclosure of account information
- OUTPUT: Ensure regulatory compliance in responses
4. Education & E-Learning
Protect students and maintain appropriate educational environment.
- INPUT: Block inappropriate content (sexual, violent, hateful)
- INPUT: Mask student PII (names, emails, student IDs)
- OUTPUT: Ensure age-appropriate responses
- OUTPUT: Prevent academic integrity violations
5. Content Moderation
Guardrails to implement:- INPUT: Filter user-generated content for harmful material
- INPUT: Block spam and malicious content
- OUTPUT: Ensure community guidelines compliance
- OUTPUT: Maintain platform safety standards
Benefits of Guardrails
AI Safety
Prevent harmful AI behavior
- Block toxic outputs
- Prevent bias and discrimination
- Stop misinformation
- Ensure appropriate content
Privacy Protection
Protect user privacy
- Automatic PII detection
- Data masking and anonymization
- Regulatory compliance
- Reduced data exposure
Compliance
Meet regulatory requirements
- GDPR compliance
- HIPAA compliance
- PCI-DSS compliance
- Industry regulations
Brand Protection
Safeguard your reputation
- Prevent PR incidents
- Maintain brand voice
- Control public messaging
- Ensure professionalism
Risk Mitigation
Reduce operational risks
- Limit legal liability
- Prevent security incidents
- Control information disclosure
- Audit trail maintenance
User Trust
Build user confidence
- Demonstrate commitment to safety
- Transparent data handling
- Consistent behavior
- Professional interactions
Guardrails vs. Answer Filters
Understanding the difference between these two features:| Feature | Guardrails | Answer Filters |
|---|---|---|
| Purpose | Safety & compliance enforcement | Response guidance for specific queries |
| Scope | All interactions | Specific query patterns |
| Action | Block or mask content | Guide response content |
| Trigger | Policy violations detected | Query similarity matching |
| Granularity | Broad safety rules | Specific Q&A pairs |
| Technology | Amazon Bedrock Guardrails | Semantic similarity |
| Best For | Content safety, PII protection, compliance | Consistent answers to FAQs |
Best Practice: Use both Guardrails and Answer Filters together for comprehensive agent control. Guardrails provide safety boundaries, while Answer Filters ensure consistent, high-quality responses within those boundaries.
Limitations & Considerations
Detection Accuracy
Detection Accuracy
Understanding AI detection limitsGuardrails use advanced AI models but are not 100% perfect:
- May occasionally miss sophisticated attempts to bypass filters
- Can have false positives (blocking safe content)
- Context-dependent detection may vary
- Evolving adversarial techniques
- Regular monitoring and review
- Continuous model updates
- Human oversight for critical applications
- Multi-layered security approach
Performance Impact
Performance Impact
Latency considerationsGuardrails add processing time to each interaction:
- INPUT guardrails: +50-200ms
- OUTPUT guardrails: +50-200ms
- PII masking: +100-300ms
- Multiple guardrails compound latency
- Use only necessary guardrails
- Optimize guardrail selection
- Consider async processing where possible
- Balance safety with performance needs
Language Support
Language Support
Multi-language considerationsGuardrail effectiveness varies by language:
- Best performance in English
- Good support for major European languages
- Limited support for some languages
- Cultural context differences
- Test thoroughly in target languages
- Consider language-specific guardrails
- Monitor performance by language
- Adjust thresholds as needed
Context Sensitivity
Context Sensitivity
Understanding context limitationsGuardrails may struggle with:
- Sarcasm and irony
- Cultural nuances
- Domain-specific terminology
- Context-dependent appropriateness
- Test with realistic scenarios
- Provide feedback for improvement
- Use domain-specific guardrails
- Combine with human review
Getting Started
Ready to implement Guardrails for your agents?Assess Your Needs
Identify what content needs protection:
- What are your compliance requirements?
- What PII needs to be masked?
- What topics should be prohibited?
- What are your safety priorities?
Plan Custom Guardrails
Determine if you need organization-specific guardrails for:
- Industry-specific compliance
- Custom PII handling
- Organization-specific policies
- Brand-specific requirements
Request Creation
Work with your PLai administrator to create custom guardrails through Amazon Bedrock Guardrails service
Next Steps
Configuration Guide
Learn how to configure and apply Guardrails to your agents
Best Practices
Discover expert tips for effective Guardrail implementation
API Reference
Explore the Guardrails API for programmatic control
Answer Filters
Learn about complementary response control features
Frequently Asked Questions
How does the default guardrail work?
How does the default guardrail work?
The default INPUT guardrail is automatically active on all agents and blocks queries containing sexual content, hate speech, insults, politics, or religion. It runs before any custom guardrails and requires no configuration.
Can I disable the default guardrail?
Can I disable the default guardrail?
The default guardrail provides essential safety protection and is always active. However, you can create custom guardrails with different policies for organization-specific needs.
How is PII detected and masked?
How is PII detected and masked?
Guardrails use advanced pattern matching and AI models to detect PII such as emails, phone numbers, addresses, and financial data. Detected PII is replaced with tokens like [EMAIL_REDACTED] or [PHONE_REDACTED].
What happens when a guardrail blocks content?
What happens when a guardrail blocks content?
When content is blocked, users receive a polite safety message indicating that the request cannot be fulfilled. The original content is logged for monitoring but not processed by the AI model.
How do I create a custom guardrail?
How do I create a custom guardrail?
Custom guardrails are created through the Amazon Bedrock Guardrails service. Contact your PLai administrator or account manager to request creation of organization-specific guardrails.
Do guardrails work with all AI models?
Do guardrails work with all AI models?
Yes, guardrails operate independently of the underlying AI model. They filter content before and after model processing, working with any model supported by PLai Framework.
Can I see when guardrails are triggered?
Can I see when guardrails are triggered?
Yes, guardrail activations are logged in your agentโs analytics. You can monitor trigger frequency, blocked content patterns, and guardrail effectiveness.
What's the difference between blocking and masking?
What's the difference between blocking and masking?
Blocking completely prevents content from being processed (used for harmful content). Masking redacts specific sensitive information while allowing the conversation to continue (used for PII protection).