A from-scratch transformer model trained on Assamese, English, and Hinglish. Purpose-built for 50M+ Northeast India speakers. Deployed at zero cost.
Rotary Position Embeddings — encodes position through vector rotation. Extrapolates to unseen lengths.
Root Mean Square Normalization — 15% faster than LayerNorm, same training stability.
Swish-Gated Linear Unit — outperforms GELU and ReLU across all model sizes.
Grouped Query Attention — 8 query heads, 4 KV heads. 2x less memory, same quality.
Key-Value caching for autoregressive inference. 10-50x faster generation.
Trained on Assamese script. 3.1x compression vs character-level. Full Unicode support.
Trained on Kaggle T4 x2 GPU (free). 7 smart techniques: label smoothing, z-loss, EMA, stochastic depth, curriculum learning, token dropout, GQA. Running live right now.
With Startup India funding for NVIDIA DGX Spark (128GB unified memory). 40x more capable. Production-grade NE India AI. Same zero-cost deployment.