Advanced Tuning

This guide covers parameter tuning and customization options for fine-tuning your visualization.

First-Year Inference Parameters

When inferring first years from Google Books 1-grams, you can adjust several parameters:

`--tau` (Threshold)

Raises or lowers the minimum frequency threshold. Higher values mean stricter requirements for word appearance.

Default: 1e-9
Lower values: More sensitive, may catch rare words earlier but also more noise
Higher values: More conservative, requires stronger signal

`--guard` (Persistence Requirement)

Requires more non-zero support after the candidate year. This ensures the word persists rather than appearing as a one-off.

Default: 3
Lower values: Less strict persistence check
Higher values: Require more years of consistent appearance

`--window` (Smoothing Window)

Controls the moving average window size for smoothing frequency data.

Default: 3
Lower values: Less smoothing, more responsive to changes
Higher values: More smoothing, reduces noise but may blur rapid changes

`--start-year` and `--end-year`

Define the time range for analysis.

Default: 1800 to 2019
Adjust based on your research questions or data availability

Rendering Parameters

Resolution

--width 1920 --height 1080

Adjust output resolution. Higher values produce sharper images but take longer to render.

Node Sizing

--min-radius 10 --max-radius 120

Control the size range of nodes. Node radius scales with the square root of cumulative count.

--min-radius: Minimum node radius in pixels
--max-radius: Maximum node radius in pixels

Font Sizing

--title-font-size 112 --detail-font-size 42

Adjust text overlay sizes for year labels and statistics.

Edge Appearance

--base-edge-alpha 25 --edge-depth 6

--base-edge-alpha: Base transparency for edges (0-255)
--edge-depth: Maximum depth to draw edges

Animation Timing

--start-progress 0.25 --end-progress 1.0

Control the eased scale animation:

--start-progress: Starting scale factor (0.0-1.0)
--end-progress: Ending scale factor (0.0-1.0)

Lower start values make early frames more readable but may hide early growth.

Encoding Parameters

Frame Rate

--fps 7.333 --gif-fps 12

--fps: MP4 frame rate
--gif-fps: GIF frame rate (often lower for file size)

GIF Optimization

--gif-width 1280

Reduce GIF width to decrease file size while maintaining reasonable quality.

Prefix Depth

The default depth is 6 letters, which balances detail with performance. You can adjust this:

--depth 6

Lower depths (1-3): Broader overview, faster rendering
Higher depths (7-10): More detail, slower rendering, larger files

Layout Options

Radial Layout

The default radial layout provides a circular, hierarchical view:

python -m src.viz.render_frames \
  artifacts/trie/prefix_counts.jsonl \
  outputs/frames

Rectangular Layout

For deterministic positioning with labels:

python -m src.viz.layout \
  artifacts/trie/prefix_counts.jsonl \
  artifacts/layout/prefix_positions.json

python -m src.viz.render_frames_rectangular \
  artifacts/trie/prefix_counts.jsonl \
  artifacts/layout/prefix_positions.json \
  outputs/frames \
  --label-limit 8 --label-depth 4 --label-spacing 20

Rectangular-specific parameters:

--label-limit: Maximum number of labels per depth level
--label-depth: Maximum depth to show labels
--label-spacing: Spacing between labels

Performance Optimization

Caching

The pipeline writes intermediate artifacts. If you work iteratively, you can skip stages that are already complete:

artifacts/lemmas/lemmas.tsv - Skip lemma extraction
artifacts/years/first_years.tsv - Skip first-year inference
artifacts/trie/prefix_counts.jsonl - Skip prefix aggregation

Memory Considerations

The pipeline streams data where possible
Lemma extraction streams XML to keep memory stable
N-gram processing processes shards sequentially
Rendering loads prefix counts into memory but processes frames sequentially

Network Optimization

Download time dominates on fresh runs. Consider:

Running downloads overnight
Using faster connections for initial setup
Reusing artifacts across experiments

Custom Extensions

To take this project further, you could:

Add suffix views or bigram/character-level variants
Try other corpora or time ranges and compare
Build an interactive explorer where you can scrub the year
Generate per-letter or per-decade summaries for dashboards
Experiment with different color palettes or visual encodings