Troubleshooting

Troubleshooting

Common issues and solutions when working with the English Lexicon Time Machine.

Setup Issues

ffmpeg Not Found

Problem: The encode step fails with “ffmpeg not found”.

Solution: Install ffmpeg and ensure it’s on your PATH:

# macOS
brew install ffmpeg

# Linux (Ubuntu/Debian)
sudo apt-get install ffmpeg

# Verify installation
ffmpeg -version

Then re-run the encode step.

Fonts Missing

Problem: Text rendering looks poor or uses bitmap fonts.

Solution: The renderer tries DejaVuSans.ttf, then Arial.ttf, then a default bitmap font. Install DejaVu Sans or Arial for better text rendering:

# macOS
brew install font-dejavu

# Linux
sudo apt-get install fonts-dejavu

Virtual Environment Fails

Problem: Python virtual environment creation fails.

Solution:

Download Errors

Problem: Dataset downloads fail or are corrupted.

Solution:

Processing Issues

Memory Errors

Problem: Out of memory errors during processing.

Solution:

Shards Corrupted

Problem: N-gram shard files are corrupted.

Solution:

Slow Network

Problem: Downloads take a very long time.

Solution:

Rendering Issues

Missing Frames

Problem: Some frames are missing from the output.

Solution:

Low Quality Output

Problem: Rendered frames look pixelated or low quality.

Solution:

Performance Issues

Problem: Rendering takes too long.

Solution:

Encoding Issues

MP4 Encoding Fails

Problem: MP4 encoding produces errors or fails.

Solution:

GIF Too Large

Problem: GIF file size is too large for sharing.

Solution:

Encoding Quality Issues

Problem: Encoded video/GIF quality is poor.

Solution:

Data Issues

No Words Found

Problem: First-year inference finds very few words.

Solution:

Unexpected First Years

Problem: First years seem incorrect or inconsistent.

Solution:

General Tips

Performance and Caching

The pipeline streams data where possible and writes intermediate artifacts. If you work iteratively, you can skip stages that are already complete:

The provided setup script also preserves a simple artifact cache under artifacts/ so repeats are fast. The heavy hitters are downloading n-gram shards and scanning them once.

Getting Help

If you encounter issues not covered here:

See Also