Create your own ChatGPT extension

Create your own ChatGPT extension

A few days back I came across a post by Prashant Yadav on LinkedIn where he created a Chrome extension that uses ChatGPT. I was very much fascinated by it and also curious to know how the extension works. After reading the blog written by Prashant Yadav the extension looked pretty simple to be made so I gave it a try and turns out it's pretty simple to make Chrome extensions. So here's my try at making one.

I could just copy and paste the code given in the blog and say that I have the final product but it won't make any sense not understanding how anything works, so I started from scratch and learnt about how Chrome extensions work, what debouncing means, why chrome doesn't allow to run inline scripts in extensions, what is Content Security Policy and many more

How does the extension work ??

Whenever we type the command kai:"Anything here"; the extension uses the power of ChatGPT and displays the result as shown below

How do we create a Chrome Extension ??

Well, creating a Chrome extension is just similar to creating a web application but it requires a manifest.json which is nothing but a file that contains all the information that defines the extension. There are some required fields in the manifest file which are necessary to run any extension

{
    // Required Fields
    "manifest_version": 3,
    "name": "My Super Cool Extension!!",
    "version": "1.0.1",

    // Recommended
    "action": {...},
    "default_locale": "en",
    "description": "A plain text description",
    "icons": {...},
}

So let's create our manifest.json shown below

{
    "name" : "Keep AI",
    "description" :"Use the power of ChatGPT to write your notes",
    "author":"Subhani Syed",
    "manifest_version":3,
    "version":"0.0.1",
    "permissions":["activeTab"],
    "action":{
        "default_popup":"popup.html"
    },
    "content_scripts":[
        {
            "matches":["<all_urls>"],
            "runAt":"document_end",
            "js":["content.js"],
            "all_frames":true
        }
    ]
}

Let's understand some of the main properties here

manifest_version An integer specifying the version of the manifest file format our package requires. It is suggested to use Manifest V3 for any new extension.

permissions Contain the permissions required by the extension to access, here we need to access the tab and read its data so we use the "activeTab"

default_popup The default HTML page show when we click on the extension icon

content_scripts These contain the files that are run in the context of web pages and have access to the DOM

runAt This indicates when to load the Javascript file, here we will load it after the document has finished loading i.e document_end

js Our entire logic of the extension will be present here

The PopUp Window...

popup.html - This is the popup window which is seen when you press the extension icon

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Keep AI</title>
  </head>
  <body>
    <h1>Use ChatGPT to write your notes and stories..</h1>
  </body>
</html>

Before moving forward we need to know about an important concept in Js which is known as Debouncing

What is Debouncing ???

If you've ever had to deal with fast-paced user input in JavaScript, you may have come across the term "debouncing". Debouncing is a technique used to optimize performance by reducing the number of function calls triggered by an event.

Debouncing is a technique that prevents a function from being called multiple times within a short period. It's particularly useful for events like scroll, resize, and keypress, where a user's interaction may trigger the event multiple times in quick succession.

For example, imagine you have a search bar that triggers an API call on every keystroke. Without debouncing, each keystroke would trigger a separate API call, resulting in unnecessary network traffic and slower performance.

Debouncing works by adding a delay between the trigger event and the function call. When the event is triggered, a timer is set. If the event is triggered again before the timer has expired, the original timer is cancelled and a new one is set. This process repeats until the timer expires, at which point the function is called.

Let's see how we implement it in Javascript

function debounce(func, delay) {
    let timeout;
    return function () {
        const context = this;
        const args = arguments;
        clearInterval(timeout);
        timeout = setTimeout(() => func.apply(context, args), delay);
    };
}

To know more about Debouncing do check out this video by "Akshay Saini "-Video

Where is the Logic for the extension?

content.js This file contains all the functionality of scrapping the text from the browser, checking if it's in the required format, making API calls and many more.

// Helper function to debounce
function debounce(func, delay) {
    let timeout;
    return function () {
        const context = this;
        const args = arguments;
        clearInterval(timeout);
        timeout = setTimeout(() => func.apply(context, args), delay);
    };
}

// Helper function to check for "kai:<anything between>; pattern"
const getTextParsed = (text) => {
    const parsed = /kai:(.*?)\;/gi.exec(text);
    return parsed ? parsed[1] : "";
};

// Helper function to get text content from the nodes
const getTextContentFromDOMElements = (nodes, textarea = false) => {
    if (!nodes || nodes.length === 0) {
      return null;
    }

    for (let node of nodes) {
      const value = textarea ? node.value : node.textContent;
      if (node && value) {
        const text = getTextParsed(value);
        if (text) return [node, text];
        else return null;
      }
    }
};

// Make API call
const callAPI = async (node,value) => {
    try{
        const headers = new Headers();
        headers.append("Content-Type","application/json");
        headers.append("Authorization", `Bearer ${YOUR_API_KEY}`);
        const payload = JSON.stringify({
            model:"text-davinci-003",
            prompt:value,
            max_tokens: 2048,
            temperature: 0,
            top_p: 1,
            n: 1,
            stream: false,
            logprobs: null,
        });
        const options = {
            method: "POST",
            headers: headers,
            body:payload,
            redirect:"follow",
        };
        const response = await fetch("https://api.openai.com/v1/completions",options);
        const data = await response.json();
        const {choices} = data;
        const text = choices[0].text.replace(/^\s+|\s+$/g,"");
        node.innerText = text;
    }catch(err){
        console.error("Error occured while calling the OpenAI API",err);
    }

}

// Helper function to check the text enterd and call api
function getText() {
    const elements = document.querySelectorAll('[contenteditable="true"]');
    const parsedValue = getTextContentFromDOMElements(elements);
    if(parsedValue){
        const[node,value] = parsedValue;
        callAPI(node,value);
    }

}

const debouncedGetText = debounce(getText, 1000);

window.addEventListener("keypress", debouncedGetText);

Let's summarize the above code

debounce - Helper function to debounce a given function with a delay

getTextParsed - A helper function that takes a string and returns a parsed string that matches the kai:<anything between>; pattern. This pattern is used to extract relevant data from the string.

getTextContentFromDOMElements - This function takes an array of nodes and a boolean parameter to indicate if the node is a textarea. It then loops through the array of nodes and gets the text content from the node. If the text content matches the kai:<anything between>; pattern, it returns an array with the node and the parsed text.

callAPI - It is responsible for making a call to the OpenAI API with a specified payload and headers. It takes two parameters, a node, and a value. The function creates a new Headers object with the necessary headers and a JSON string of the payload. It then makes a POST request to the OpenAI API endpoint with the headers and payload. Upon receiving the response, it extracts the generated text from the response and updates the text content of the node.

getText - It gets all "contenteditable" elements on the page and extracts the relevant data using the getTextContentFromDOMElements function. If the relevant data is found, it calls the callAPI function with the extracted data.

debouncedGetText function is a debounced version of the getText function and is called whenever a keypress event occurs in the window.

Hurray!! now we have created our own Chrome extension that uses ChatGPT but how do we use it,

To use the extension follow the below steps

  1. Open Google Chrome and navigate to the extensions page by entering "chrome://extensions" in the address bar or settings > extensions

  2. On the extensions page, enable Developer Mode by toggling the switch in the top right corner.

  3. Click the "Load unpacked" button in the top left corner of the screen.

  4. Load the directory where we created our files -manifest.json, popup.html and content.js

  5. Chrome will now load the extension and add it to the list of installed extensions on the extensions page.

Now you can use the extension but it won't work, hmm that's odd. Well, it's because we didn't provide any API key to make API calls.

Create your API key by going to the openAI website link

Never share your API key with anyone and never expose them in any public repositories

Now just replace the YOUR_API_KEY in the content.js file with the one generated and reload the extension

Congratulations!! you have created your Chrome extension that uses ChatGPT.

You can find the entire source code on GitHub - link

Thanks for reading the article if you have learnt something new or interesting please give it a thumbs up and I would highly appreciate sharing it with your friends.