Building a Mortgage Document Classification Model using an LLM Library

Guest

Nov 30th, 2025

265

Never

Not a member of gistpad yet? Sign Up, it unlocks many cool features!

Python 6.87 KB | Software | 1 0

copy raw download clone embed print report

from some_llm_library import LLMClient
client = LLMClient(api_key="<api_key_here>")
system_prompt = """
You are a mortgage document classification model, treating every page as an isolated unit.
Your task is to identify the document type of a single page taken from a multi-page mortgage file.
Rules:
- Choose exactly one label from the provided list.
- Do NOT generate new labels or modify label names.
- Do NOT provide explanations, reasoning, or extra text.
- Output must contain exactly two fields: "label" and "confidence".
- "confidence" must be a numeric float value between 0 and 1.
- If the content does not clearly match any of the specific labels provided, or if the page is blank, illegible, or irrelevant, you MUST classify it as Unclassified.
Always return the result strictly in the required JSON format.
"""
user_prompt_template = """
Classify the following page into exactly one of the predefined mortgage document labels.
Only choose a label from the complete list shown below.
Do NOT invent any new labels or alter the label names.
Allowed labels (complete list):
- Mortgage - Closing Disclosure - Seller
- Lender - Rate Note
- Title - Rider
- Property - Tax Record Information Sheet
- Title - Signature / Name Affidavit (Ack)
- Unclassified
Page Content:
{PAGE_TEXT}
Return your answer in the following JSON format:
{
"label": "<one_of_the_labels>",
"confidence": <float_between_0_and_1>
}
"""
# Example page text (in practice, this would come from OCR processing of the document page)
page_text = "THIS IS SOME OCR TEXT FROM THE DOCUMENT PAGE..."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt_template.format(PAGE_TEXT=page_text)}
]
response = client.generate(
model="<model_name>",
messages=messages,
temperature=0, # 0 for classification tasks
max_tokens=200
)
print(response.output)

RAW Gist Data Copied

from some_llm_library import LLMClient

client = LLMClient(api_key="<api_key_here>")

system_prompt = """
You are a mortgage document classification model, treating every page as an isolated unit.
Your task is to identify the document type of a single page taken from a multi-page mortgage file.

Rules: 
- Choose exactly one label from the provided list.
- Do NOT generate new labels or modify label names.
- Do NOT provide explanations, reasoning, or extra text.
- Output must contain exactly two fields: "label" and "confidence".
- "confidence" must be a numeric float value between 0 and 1.
- If the content does not clearly match any of the specific labels provided, or if the page is blank, illegible, or irrelevant, you MUST classify it as Unclassified.

Always return the result strictly in the required JSON format.
"""

user_prompt_template = """
Classify the following page into exactly one of the predefined mortgage document labels.

Only choose a label from the complete list shown below.
Do NOT invent any new labels or alter the label names.

Allowed labels (complete list):
- Mortgage - Closing Disclosure - Seller
- Lender - Rate Note
- Title - Rider
- Property - Tax Record Information Sheet
- Title - Signature / Name Affidavit (Ack)
- Unclassified

Page Content:
{PAGE_TEXT}

Return your answer in the following JSON format:
{
  "label": "<one_of_the_labels>",
  "confidence": <float_between_0_and_1>
}
"""

# Example page text (in practice, this would come from OCR processing of the document page)
page_text = "THIS IS SOME OCR TEXT FROM THE DOCUMENT PAGE..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt_template.format(PAGE_TEXT=page_text)}
]

response = client.generate(
    model="<model_name>",
    messages=messages,
    temperature=0,   # 0 for classification tasks
    max_tokens=200
)

print(response.output)