A from-scratch transformer model trained on Assamese, English, and Hinglish. Purpose-built for 50M+ Northeast India speakers. Deployed at zero cost.
Rotary Position Embeddings — encodes position through vector rotation. Extrapolates to unseen lengths.
Root Mean Square Normalization — 15% faster than LayerNorm, same training stability.
Swish-Gated Linear Unit — outperforms GELU and ReLU across all model sizes.
Grouped Query Attention — 8 query heads, 4 KV heads. 2x less memory, same quality.
Key-Value caching for autoregressive inference. 10-50x faster generation.
Trained on Assamese script. 3.1x compression vs character-level. Full Unicode support.
Proof-of-concept trained on Google Colab T4 GPU (free). Demonstrates architecture, tokenizer, and deployment pipeline. Running live right now.
With Startup India funding for NVIDIA DGX Spark (128GB unified memory). 40x more capable. Production-grade NE India AI. Same zero-cost deployment.