Part 7: How to run inference on Microsoft Phi-2 Language Model on Google Colab

3 min readDec 30, 2023

Microsoft released Phi-2 Language model on 12 December 2023. Everyone is trying to train the largest language model. But one of the biggest use cases of language models would be as an assistant on mobiles. In December Google launched the Gemini Nano model which will run on Pixel 8 pro phones. Only a few days later Microsoft released the Phi-2 language model which is similar in size as the Gemini nano model but has better performance.

Comparison between Phi-2 model and Gemini nano model

Not only Phi-2 language model holds its own against Gemini nano. It performs not much worse compared to much larger open-source language models

Comparison between Phi-2 model and Larger open-source models

Significantly for coding, Phi-2 language model is better than even Llama-2 70 B parameter language model.

Inference on Phi-2 language model

To run inference on Phi-2 language model, follow the following steps. You can run the code in Google Colab. You can run it also on your local laptop or PC but you have to install an appropriate CUDA driver for your GPU.

1 First open a new colab notebook. Choose runtime as GPU.

2 Install einops library

pip install einops

3 Import the required libraries

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

4 Instantiate the model

torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

This will download the phi-2 language model.

5. Define the following function for querying the model

def response(query):
  inputs = tokenizer(query, return_tensors="pt", return_attention_mask=False)
  outputs = model.generate(**inputs, max_length=200)
  text = tokenizer.batch_decode(outputs)[0]
  return text

I tried a couple of queries on the model.

The performance of phi-2 model is not anywhere close to the free version of Chatgpt. But if you want to run an LLM locally on a mobile or an inexpensive PC or laptop then you cannot find anything better than Phi-2 language model.

Source

https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

ChatGPT

Python

OpenAI

Written by Rohit Raj

148 Followers

149 Following

Studied at IIT Madras and IIM Indore. Love Data Science

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

How to build chatbot with a local LLM in 5 minutes

Anthony Sun

How to build chatbot with a local LLM in 5 minutes

Generated by ChatGPT

Oct 31, 2024

Fine-Tuning TinyLlama on WhatsApp Chats: Build Your Own Personal AI Chatbot! 🚀

Aditya Mangal

Fine-Tuning TinyLlama on WhatsApp Chats: Build Your Own Personal AI Chatbot! 🚀

Introduction

Feb 16

Lists

ChatGPT

21 stories991 saves

ChatGPT prompts

51 stories2644 saves

What is ChatGPT?

9 stories521 saves

The New Chatbots: ChatGPT, Bard, and Beyond

12 stories563 saves

Building an AI-Powered E-commerce Chatbot with LangChain and Gemini

Dev Genius

Code With Marish

Building an AI-Powered E-commerce Chatbot with LangChain and Gemini

In this blog post, we’ll build an AI-powered chatbot for e-commerce customer support using LangChain and Google’s Gemini AI. This chatbot…

Feb 22

Building a Speech-to-Text Analysis System with Python

Samar Singh

Building a Speech-to-Text Analysis System with Python

Speaker Diarization and Identification

Sep 19, 2024

Creating a Knowledge Base Bot with OpenAI and Azure Search

Asiya Farseen

Creating a Knowledge Base Bot with OpenAI and Azure Search

Managing and retrieving relevant information from extensive datasets is a common challenge. By combining OpenAI’s natural language…

Dec 9, 2024

Enhancing Chatbots with Ollama Vision Models: Adding Image Reasoning Capabilities

Anthony Sun

Enhancing Chatbots with Ollama Vision Models: Adding Image Reasoning Capabilities

Nov 20, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams

Part 7: How to run inference on Microsoft Phi-2 Language Model on Google Colab

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Rohit Raj

No responses yet

More from Rohit Raj

How to do Voice Cloning in Python

Voice cloning using F5-TTS

Classification Algorithms in Python

classification algorithms

Part 10- Voice Chat with Gemini

Voice Chat with Gemini 1.5 pro API

Part 6: How to chat with SQL database with Langchain and OpenAI API

Use Langchain along with SQLITE3

Recommended from Medium

How to build chatbot with a local LLM in 5 minutes

Generated by ChatGPT

Fine-Tuning TinyLlama on WhatsApp Chats: Build Your Own Personal AI Chatbot! 🚀

Introduction

Lists

ChatGPT

ChatGPT prompts

What is ChatGPT?

The New Chatbots: ChatGPT, Bard, and Beyond

Building an AI-Powered E-commerce Chatbot with LangChain and Gemini

In this blog post, we’ll build an AI-powered chatbot for e-commerce customer support using LangChain and Google’s Gemini AI. This chatbot…

Building a Speech-to-Text Analysis System with Python

Speaker Diarization and Identification

Creating a Knowledge Base Bot with OpenAI and Azure Search

Managing and retrieving relevant information from extensive datasets is a common challenge. By combining OpenAI’s natural language…

Enhancing Chatbots with Ollama Vision Models: Adding Image Reasoning Capabilities