Using LLM to generate JSONs locally with Llama.cpp

Summary:

Intro
Code
Schema Details
How To Use

Intro

A few days back, I tried running a small language model locally to generate JSONs without the nasty filler text, similar to the JSON mode recently released by OpenAI. I was pleasantly surprised by the grammar constraint feature of Llama.cpp, which made it possible to experiment with smaller models even on my laptop's CPU. As a result, this article was created to share some functional code snippets that others can easily copy, execute, and modify.

Please note that the code snippet requires a local GGUF model and the llama-cpp-python library to work properly. If the model and library are not downloaded / installed, check out the How To Use section for instructions.

That's enough context, let's get started.

Code

# Importing the Llama and LlamaGrammar classes from the llama_cpp library
from llama_cpp import Llama, LlamaGrammar

# Initializing a Llama client with the specified language model
client = Llama(
    model_path="path/to/gguf/model.gguf", # Replace with your own GGUF model file
)

# Defining a prompt for the language model
prompt = """
Describe an orange using a JSON file
"""

# Defining the following custom grammar schema, see Schema Details section for detail:
# {
#   "string_field": String,
#   "number_field": Number,
#   "boolean_field": Boolean
# }

schema = r'''
root ::= (
  "{" newline
    doublespace "\"string_field\":" space string "," newline
    doublespace "\"number_field\":" space number "," newline
    doublespace "\"boolean_field\":" space boolean newline
  "}"
)
newline ::= "\n"
doublespace ::= "  "
space ::= " "
number ::= [0-9]+   "."?   [0-9]*
string ::= "\""   ([^"]*)   "\""
boolean ::= "true" | "false"
'''

# Creating a LlamaGrammar object with schema string
# Set verbose=False to not print the grammar, set to True for debugging
grammar = LlamaGrammar.from_string(grammar=schema, verbose=False)

# Processing the prompt using the Llama client to generate a response
answer = client(
    prompt,
    grammar=grammar, # Add the grammar constraint with the LlamaGrammar object
    temperature=0.0, # Set temperature to 0 for deterministic (non-random) output
)

# Printing the response generated by the Llama client
print(answer["choices"][0]["text"])

Here is the response from a finetuned TinyLlama, 1B in size with 2K quantization:

{
  "string_field": "orange",
  "number_field": 10,
  "boolean_field": true
}

If we comment out the line grammar=grammar, we would get a lot more unwanted filler text:

```json
{
    "orange": {
        "color": "red",
        "size": 10,
        "weight": 20
    }
}
```

```javascript
const oranges = require('./oranges.json');

console.log(oranges);
```

### Deserialize an orange using a JSON file

Describe an orange using a JSON file

```json
{
    "orange": {
        "color": "red",
        "size": 10,
        "

Schema Details

Most of the code above is just calling functions from the llama-cpp-python library, and the library's official documentation is much better than any explanation I can provide here. As a result, this article focuses more on introducing the schema format.

I write my schemas in a mixture of Context-Free Grammar (CFG) and Regular Expression format, both used for defining guidelines of programming languages, data formats, or natural languages. It is basically a highly efficient way to describe grammar rules.

The format consists of symbols and rules. Let us look at a very simple example of a rule:

binary ::= "0" | "1"

In the above rule, we defined a symbol binary that is either "0" or "1".

Now that we have the basics down, we can look at more complicated rules:

[]: The brackets define a class of characters constrained by the definitions inside.

// The letter "B"
letter_B ::= [B]

(): The parentheses serves as a container.

// The string "AB"
letters ::= [A] "B"

// Still the string "AB"
letters_with_newlines ::= ( [A]
"B"
)

// This would yield an error
letters ::= [A]
"B"

^: The caret character means "not" when used at the start of a character class.

// Any single character that is NOT "A"
letter_not_A ::= [^A]

?: The question mark means "optional" when used after a character class.

// Either "A" or just ""
optional_A ::= [A]?

*: The asterisk means any number (including zero) occurrences of the preceding element.

// Any number of "A", e.g. "", "A", "AA", ...
zero_or_more_A ::= [A]*

+: The plus sign means any non-zero number of occurrences of the preceding element.

// One or more occurences of "A", e.g. "A", "AA", "AAA", ...
one_or_more_A ::= [A]+

-: The hyphen describes a range of characters. For example, [0-9] matches any digit.

// Any single lowercase letter
lowercase_letter ::= [a-z]

// Any number of english characters, upper or lowercase
bunch_of_letters ::= [a-zA-Z]*

Nesting these definitions together, we can understand the rules mentioned in the schema:

// One or two spaces
space ::= " "
doublespace ::= "  "

// New line character
newline ::= "\n"

// Either "true" or "false"
boolean ::= "true" | "false"

// One or more digits, optional decimal followed by more digits
number ::= [0-9]+  "."?  [0-9]*

// Any number of non-double quote letters, sandwiched between two double quotes
string ::= "\""  ([^"]*)  "\""

At last, we look at the first line where the symbol root is defined. root is a special symbol that specifies the format of the actual output. All other definitions are supplementary and exist to make the definition of root shorter and easier to understand.

We can now break down each segment with new lines and look at them:

root ::= (
  "{" newline
    doublespace "\"string_field\":" space string "," newline
    doublespace "\"number_field\":" space number "," newline
    doublespace "\"boolean_field\":" space boolean newline
  "}"
)

Now let us look at the JSON version:

{
  "string_field": String,
  "number_field": Number,
  "boolean_field": Boolean
}

Hopefully the schema makes more sense when formatted this way. Finally, here are some additional symbols that might be useful:

// Strict integer: No leading zeroes except for just "0"
strict_integer ::= "0" | [1-9][0-9]*

// Strict float: Mandatory decimal point, no trailing zeroes except for ".0"
strict_float ::= strict_integer "." ("0" | [0-9]*[1-9])

// List structure: We use string as an example, but feel free to plug in any classes
list_of_strings ::= "[]" | (
  "[" newline
    doublespace string ("," newline
    doublespace string )* newline
  "]"
)

How to Use

Note: If any link expired, please contact me through the About page and I will update it ASAP.

Install Python (If Necessary):
- Google "Install Python with Anaconda" and follow the instructions.
- If this is your first time installing Python, you may also need to install Homebrew and Pip. Similar to installing Python, you can Google the names for instructions.
Download Model:
- Currently, the best place to download GGUF models is HuggingFace.
- For a small model, try TinyLlama and download the smallest GGUF model.
  - GGUF models are the files beginning with tinyllama....
Install llama-cpp-python:
- Visit the GitHub Page and follow the instructions.
Run the Code:
- Now you can paste the code into either a Python file or a Jupyter Notebook cell.
- Remember to replace the GGUF model path with your local model path.