Part 7: How to run inference on Microsoft Phi-2 Language Model on Google Colab

Rohit Raj
3 min readDec 30, 2023

Microsoft released Phi-2 Language model on 12 December 2023. Everyone is trying to train the largest language model. But one of the biggest use cases of language models would be as an assistant on mobiles. In December Google launched the Gemini Nano model which will run on Pixel 8 pro phones. Only a few days later Microsoft released the Phi-2 language model which is similar in size as the Gemini nano model but has better performance.

Comparison between Phi-2 model and Gemini nano model

Not only Phi-2 language model holds its own against Gemini nano. It performs not much worse compared to much larger open-source language models

Comparison between Phi-2 model and Larger open-source models

Significantly for coding, Phi-2 language model is better than even Llama-2 70 B parameter language model.

Inference on Phi-2 language model

To run inference on Phi-2 language model, follow the following steps. You can run the code in Google Colab. You can run it also on your local laptop or PC but you have to install an appropriate CUDA driver for your GPU.

1 First open a new colab notebook. Choose runtime as GPU.

2 Install einops library

pip install einops

3 Import the required libraries

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

4 Instantiate the model

torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

This will download the phi-2 language model.

5. Define the following function for querying the model

def response(query):
inputs = tokenizer(query, return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
return text

I tried a couple of queries on the model.

The performance of phi-2 model is not anywhere close to the free version of Chatgpt. But if you want to run an LLM locally on a mobile or an inexpensive PC or laptop then you cannot find anything better than Phi-2 language model.

Source

  1. https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Rohit Raj
Rohit Raj

Written by Rohit Raj

Studied at IIT Madras and IIM Indore. Love Data Science

No responses yet

Write a response