Analyzer Guide
Learn how to detect PII entities in your text
The Analyzer is the first step in the anonymization process. It scans your text and identifies personally identifiable information (PII) like names, emails, phone numbers, and more.
How the Analyzer Works
The Analyzer uses multiple detection methods to identify PII:
Pattern Matching
Regular expressions detect structured data like email addresses, phone numbers, credit cards, and IBANs with high accuracy.
Machine Learning (NER)
Named Entity Recognition models identify context-dependent entities like person names, organizations, and locations using spaCy, Stanza, and Transformers.
Checksum Validation
Credit cards, IBANs, and other financial identifiers are validated using checksum algorithms (Luhn, MOD-97) for reduced false positives.
Using the Analyzer
Step 1: Enter Your Text
- Navigate to the Anonymize page
- Paste or type your text in the input area
- The interface shows a character count and token estimate
Step 2: Select Entity Types
Choose which types of PII to detect:
| Category | Entity Types | Example |
|---|---|---|
| Personal | PERSON, EMAIL_ADDRESS, PHONE_NUMBER | John Doe, john@email.com |
| Financial | CREDIT_CARD, IBAN_CODE, SWIFT_CODE | 4111-1111-1111-1111 |
| Location | LOCATION, ADDRESS, COORDINATES | 123 Main St, New York |
| Government | SSN, PASSPORT, DRIVER_LICENSE | 123-45-6789 |
| Technical | IP_ADDRESS, MAC_ADDRESS | 192.168.1.1 |
Tip: Use Presets
Instead of selecting entities manually, use Presets to quickly apply common entity configurations like "GDPR Compliance" or "Financial Data".
Step 3: Select Language
Choose the language of your text for optimal detection accuracy:
- Auto-detect - Let the system determine the language
- Specific language - Select from 27 supported languages
Language Selection Matters
Selecting the correct language significantly improves detection accuracy, especially for person names and locations.
Step 4: Run Analysis
- Click the Analyze button
- Wait for the analysis to complete (typically 1-3 seconds)
- Review the detected entities in the results panel
Understanding Results
After analysis, each detected entity shows:
Position: characters 0-8
Result Fields
- Entity Type - The category of PII detected (PERSON, EMAIL, etc.)
- Text - The actual text that was identified as PII
- Confidence Score - How certain the system is (0-100%)
- Position - Start and end character positions
Confidence Threshold
Adjust the confidence threshold to control sensitivity:
| Threshold | Effect | Best For |
|---|---|---|
| 0.3 (Low) | More entities detected, more false positives | Maximum coverage, manual review |
| 0.5 (Default) | Balanced detection and accuracy | General use |
| 0.7 (High) | Fewer entities, higher confidence | Automated processing |
| 0.9 (Very High) | Only very confident matches | Minimal intervention |
Selecting Results
After analysis, you can refine which entities to anonymize:
Select/Deselect All
- Use the checkbox in the header to select or deselect all results
- Only selected entities will be anonymized
Individual Selection
- Click individual checkboxes to include/exclude specific entities
- Useful when the analyzer detects false positives
- Useful when you want to keep certain information visible
Filter by Type
- Click on an entity type badge to filter results by that type
- Quickly select/deselect all entities of a specific type
Pro Tip
Review results before anonymizing. The analyzer may occasionally detect false positives, especially for names that are also common words.
Token Costs
Analysis operations consume tokens based on:
Cost = 2 + 1.0 × text_k + 0.2 × entities_enabled + 0.1 × entities_found
Final = ceil(Cost × 0.5)
Where:
text_k= text length in thousands of charactersentities_enabled= number of entity types selectedentities_found= number of entities detected
Cost Examples
| Text Length | Entities | Typical Cost |
|---|---|---|
| 100 characters | 3 types, 2 found | 2 tokens |
| 1,000 characters | 5 types, 5 found | 3 tokens |
| 5,000 characters | 10 types, 15 found | 6 tokens |
| 10,000 characters | 15 types, 30 found | 10 tokens |
See the Token System documentation for complete pricing details.
Best Practices
Troubleshooting
Entity not detected?
- Ensure the entity type is enabled in your selection
- Try lowering the confidence threshold
- Check that the correct language is selected
- Verify the text format matches expected patterns
Too many false positives?
- Increase the confidence threshold
- Deselect broad entity types like LOCATION
- Use entity-specific presets instead of selecting all
Analysis taking too long?
- Break large texts into smaller chunks
- Reduce the number of entity types selected
- Use presets to avoid loading unused detection models
Next Steps
Last Updated: February 2026