[Incomplete] A handbook with concepts and terms for Language conditioned RL

Posted by : at

Category : Notes


Re-occuring Models and Papers:

MCIL: Multi-context imitation learning

Multi-context learning is a framework for generalizing across heterogeneous task and goal descriptions. In a nutshell: Collect trajectories (also called ‘play’s in the paper). Look at the end and get image of goal states or some language description of the task. These images/texts are called contexts. Now a context conditioned policy function ($\pi_{\theta}(a_t|s_t,z)$) is trained with the MCIL objective function (where z is the context vector). Use seperate encoders for each scenario (ie image context or text context …) to get z(context vector).

HULC: Hierarchical Universal Language Conditioned Policies

Some terms

  • Natural lanuguage conditioned policy $\pi_{\theta}(a_t s_t,l)$ : outputs action $a_t \in \mathcal{A}$ conditioned on current state $s_t \in \mathcal{S}$ and free-form language instruction $l \in \mathcal{L}$.
About Vihaan Akshaay

I am an Applied AI Researcher with first-author publications at top-tier venues, including ICLR 2025 and NeurIPS 2023, in Computer Vision and Deep Reinforcement Learning. My work spans five research internships across premier institutions, including The Jackson Laboratory (JAX), IIT Madras, Georgia Tech, NTU Singapore, and a joint role at UC Santa Barbara and Carnegie Mellon University.

My research bridges disciplines—developing AI systems for biological behavior analysis, robotics, mechanical systems, and Earth sciences. At IIT Madras, I led the iBot Robotics Club and co-developed the ARTEMIS Railroad Crack Detection Robot, winning the International James Dyson Award. My Master’s thesis on unsupervised behavior recognition in mice was advised by B. Ravindran and Dr. Vivek Kumar at JAX.

I recently completed my M.S. in Computer Science at UC Santa Barbara, working under Lei Li and Yu-Xiang Wang. Inspired by human problem-solving strategies, I proposed a bi-directional framework for goal conditioning in state-space search. I also introduced an edge-attention-based U-Net for environmental segmentation and helped curate a large-scale landslide detection dataset with Gen Li using 40 years of Landsat imagery.

Other projects include analyzing the stability of Deep Q-Networks with Siva Theja Maguluri at Georgia Tech and designing kernelized deep randomized models (eDRVFLs) with P. N. Suganthan at NTU Singapore.

I specialize in translating cutting-edge AI theory into practical, high-impact solutions across domains. I am currently seeking opportunities in applied AI research or machine learning engineering roles, particularly those focused on impactful, real-world applications.

Useful Links