Getting Started

This guide will help you set up and run the English Lexicon Time Machine project.

Prerequisites

Python 3.8 or newer
Bash shell
Internet connection (for downloading datasets)

Initial Setup

Clone the repository:

git clone https://github.com/GraphTechnologyDevelopers/english-words-knowledge-graph.git
cd english-words-knowledge-graph

Run the setup script:
```
bash setup.sh
```
This will:
- Create a Python virtual environment
- Install all required dependencies
- Download Wiktionary and Google Books N-gram datasets
- Extract English lemmas and infer first-use years
- Build the prefix trie
- Render the visualization frames
- Encode the final MP4 and GIF outputs
Find your outputs:
- Video: outputs/english_trie_timelapse.mp4
- GIF: outputs/english_trie_timelapse.gif
- Individual frames: outputs/frames/frame-*.png

Project Structure

english-words-knowledge-graph/
├── src/
│   ├── ingest/          # Data extraction scripts
│   ├── build/           # Trie construction
│   └── viz/             # Visualization and rendering
├── artifacts/           # Cached intermediate data
├── outputs/            # Final rendered outputs
├── docs/                # Documentation site
└── setup.sh             # Main setup script

Customization

Rendering Parameters

After the initial run, you can customize the visualization:

source venv/bin/activate
python -m src.viz.render_frames \
  artifacts/trie/prefix_counts.jsonl \
  outputs/frames \
  --min-radius 50 \
  --max-radius 500 \
  --base-edge-alpha 0.3 \
  --start-progress 0.0

Data Sources

The project uses:

Wiktionary: English lemma extraction
Google Books N-grams: First-year inference for words
Neo4j (optional): Graph database exploration

Troubleshooting

Setup Issues

Virtual environment fails: Ensure Python 3.8+ is installed and accessible
Download errors: Check your internet connection and retry
Memory errors: The dataset processing can be memory-intensive; ensure sufficient RAM

Rendering Issues

Missing frames: Re-run the rendering step after ensuring artifacts exist
Low quality output: Adjust resolution parameters in the rendering script
Performance: Use artifact caching to skip re-processing on subsequent runs

Next Steps

Read the Methodology to understand how the visualization works
Follow the Step-by-Step Guide for detailed stage-by-stage instructions
Explore Advanced Tuning to customize visualization parameters
Learn How to Interpret Results to understand what you’re seeing
Check Troubleshooting if you encounter any issues

Support

For issues or questions:

Open an issue on GitHub
Visit Graph Technology Developers
Join the X community