Getting Started

Getting Started

This guide will help you set up and run the English Lexicon Time Machine project.

Prerequisites

Initial Setup

  1. Clone the repository:

    git clone https://github.com/GraphTechnologyDevelopers/english-words-knowledge-graph.git
    cd english-words-knowledge-graph
    
  2. Run the setup script:

    bash setup.sh
    

    This will:

    • Create a Python virtual environment
    • Install all required dependencies
    • Download Wiktionary and Google Books N-gram datasets
    • Extract English lemmas and infer first-use years
    • Build the prefix trie
    • Render the visualization frames
    • Encode the final MP4 and GIF outputs
  3. Find your outputs:

    • Video: outputs/english_trie_timelapse.mp4
    • GIF: outputs/english_trie_timelapse.gif
    • Individual frames: outputs/frames/frame-*.png

Project Structure

english-words-knowledge-graph/
├── src/
│   ├── ingest/          # Data extraction scripts
│   ├── build/           # Trie construction
│   └── viz/             # Visualization and rendering
├── artifacts/           # Cached intermediate data
├── outputs/            # Final rendered outputs
├── docs/                # Documentation site
└── setup.sh             # Main setup script

Customization

Rendering Parameters

After the initial run, you can customize the visualization:

source venv/bin/activate
python -m src.viz.render_frames \
  artifacts/trie/prefix_counts.jsonl \
  outputs/frames \
  --min-radius 50 \
  --max-radius 500 \
  --base-edge-alpha 0.3 \
  --start-progress 0.0

Data Sources

The project uses:

Troubleshooting

Setup Issues

Rendering Issues

Next Steps

Support

For issues or questions: