Advanced Tuning
This guide covers parameter tuning and customization options for fine-tuning your visualization.
First-Year Inference Parameters
When inferring first years from Google Books 1-grams, you can adjust several parameters:
--tau (Threshold)
Raises or lowers the minimum frequency threshold. Higher values mean stricter requirements for word appearance.
- Default:
1e-9 - Lower values: More sensitive, may catch rare words earlier but also more noise
- Higher values: More conservative, requires stronger signal
--guard (Persistence Requirement)
Requires more non-zero support after the candidate year. This ensures the word persists rather than appearing as a one-off.
- Default:
3 - Lower values: Less strict persistence check
- Higher values: Require more years of consistent appearance
--window (Smoothing Window)
Controls the moving average window size for smoothing frequency data.
- Default:
3 - Lower values: Less smoothing, more responsive to changes
- Higher values: More smoothing, reduces noise but may blur rapid changes
--start-year and --end-year
Define the time range for analysis.
- Default:
1800to2019 - Adjust based on your research questions or data availability
Rendering Parameters
Resolution
--width 1920 --height 1080
Adjust output resolution. Higher values produce sharper images but take longer to render.
Node Sizing
--min-radius 10 --max-radius 120
Control the size range of nodes. Node radius scales with the square root of cumulative count.
--min-radius: Minimum node radius in pixels--max-radius: Maximum node radius in pixels
Font Sizing
--title-font-size 112 --detail-font-size 42
Adjust text overlay sizes for year labels and statistics.
Edge Appearance
--base-edge-alpha 25 --edge-depth 6
--base-edge-alpha: Base transparency for edges (0-255)--edge-depth: Maximum depth to draw edges
Animation Timing
--start-progress 0.25 --end-progress 1.0
Control the eased scale animation:
--start-progress: Starting scale factor (0.0-1.0)--end-progress: Ending scale factor (0.0-1.0)
Lower start values make early frames more readable but may hide early growth.
Encoding Parameters
Frame Rate
--fps 7.333 --gif-fps 12
--fps: MP4 frame rate--gif-fps: GIF frame rate (often lower for file size)
GIF Optimization
--gif-width 1280
Reduce GIF width to decrease file size while maintaining reasonable quality.
Prefix Depth
The default depth is 6 letters, which balances detail with performance. You can adjust this:
--depth 6
- Lower depths (1-3): Broader overview, faster rendering
- Higher depths (7-10): More detail, slower rendering, larger files
Layout Options
Radial Layout
The default radial layout provides a circular, hierarchical view:
python -m src.viz.render_frames \
artifacts/trie/prefix_counts.jsonl \
outputs/frames
Rectangular Layout
For deterministic positioning with labels:
python -m src.viz.layout \
artifacts/trie/prefix_counts.jsonl \
artifacts/layout/prefix_positions.json
python -m src.viz.render_frames_rectangular \
artifacts/trie/prefix_counts.jsonl \
artifacts/layout/prefix_positions.json \
outputs/frames \
--label-limit 8 --label-depth 4 --label-spacing 20
Rectangular-specific parameters:
--label-limit: Maximum number of labels per depth level--label-depth: Maximum depth to show labels--label-spacing: Spacing between labels
Performance Optimization
Caching
The pipeline writes intermediate artifacts. If you work iteratively, you can skip stages that are already complete:
artifacts/lemmas/lemmas.tsv- Skip lemma extractionartifacts/years/first_years.tsv- Skip first-year inferenceartifacts/trie/prefix_counts.jsonl- Skip prefix aggregation
Memory Considerations
- The pipeline streams data where possible
- Lemma extraction streams XML to keep memory stable
- N-gram processing processes shards sequentially
- Rendering loads prefix counts into memory but processes frames sequentially
Network Optimization
Download time dominates on fresh runs. Consider:
- Running downloads overnight
- Using faster connections for initial setup
- Reusing artifacts across experiments
Custom Extensions
To take this project further, you could:
- Add suffix views or bigram/character-level variants
- Try other corpora or time ranges and compare
- Build an interactive explorer where you can scrub the year
- Generate per-letter or per-decade summaries for dashboards
- Experiment with different color palettes or visual encodings
See Also
- Step-by-Step Guide – Detailed instructions for each stage
- Interpreting Results – Understanding the visualization
- Troubleshooting – Common issues and solutions