Getting Started
Getting Started
This guide will help you set up and run the English Lexicon Time Machine project.
Prerequisites
- Python 3.8 or newer
- Bash shell
- Internet connection (for downloading datasets)
Initial Setup
-
Clone the repository:
git clone https://github.com/GraphTechnologyDevelopers/english-words-knowledge-graph.git cd english-words-knowledge-graph -
Run the setup script:
bash setup.shThis will:
- Create a Python virtual environment
- Install all required dependencies
- Download Wiktionary and Google Books N-gram datasets
- Extract English lemmas and infer first-use years
- Build the prefix trie
- Render the visualization frames
- Encode the final MP4 and GIF outputs
-
Find your outputs:
- Video:
outputs/english_trie_timelapse.mp4 - GIF:
outputs/english_trie_timelapse.gif - Individual frames:
outputs/frames/frame-*.png
- Video:
Project Structure
english-words-knowledge-graph/
├── src/
│ ├── ingest/ # Data extraction scripts
│ ├── build/ # Trie construction
│ └── viz/ # Visualization and rendering
├── artifacts/ # Cached intermediate data
├── outputs/ # Final rendered outputs
├── docs/ # Documentation site
└── setup.sh # Main setup script
Customization
Rendering Parameters
After the initial run, you can customize the visualization:
source venv/bin/activate
python -m src.viz.render_frames \
artifacts/trie/prefix_counts.jsonl \
outputs/frames \
--min-radius 50 \
--max-radius 500 \
--base-edge-alpha 0.3 \
--start-progress 0.0
Data Sources
The project uses:
- Wiktionary: English lemma extraction
- Google Books N-grams: First-year inference for words
- Neo4j (optional): Graph database exploration
Troubleshooting
Setup Issues
- Virtual environment fails: Ensure Python 3.8+ is installed and accessible
- Download errors: Check your internet connection and retry
- Memory errors: The dataset processing can be memory-intensive; ensure sufficient RAM
Rendering Issues
- Missing frames: Re-run the rendering step after ensuring artifacts exist
- Low quality output: Adjust resolution parameters in the rendering script
- Performance: Use artifact caching to skip re-processing on subsequent runs
Next Steps
- Read the Methodology to understand how the visualization works
- Follow the Step-by-Step Guide for detailed stage-by-stage instructions
- Explore Advanced Tuning to customize visualization parameters
- Learn How to Interpret Results to understand what you’re seeing
- Check Troubleshooting if you encounter any issues
Support
For issues or questions:
- Open an issue on GitHub
- Visit Graph Technology Developers
- Join the X community