Multiclass Classification for Transactions

Oct 20, 2022

For this notebook we will be looking to classify a public dataset of transactions into a number of categories that we have predefined. These approaches should be replicable to any multiclass classification use case where we are trying to fit transactional data into predefined categories, and by the end of running through this you should have a few approaches for dealing with both labelled and unlabelled datasets.

The different approaches we'll be taking in this notebook are:

Zero-shot Classification: First we'll do zero shot classification to put transactions in one of five named buckets using only a prompt for guidance
Classification with Embeddings: Following this we'll create embeddings on a labelled dataset, and then use a traditional classification model to test their effectiveness at identifying our categories
Fine-tuned Classification: Lastly we'll produce a fine-tuned model trained on our labelled dataset to see how this compares to the zero-shot and few-shot classification approaches

Setup

%load_ext autoreload
%autoreload
%pip install openai 'openai[datalib]' 'openai[embeddings]' transformers

import openai
import pandas as pd
import numpy as np
import json
import os

COMPLETIONS_MODEL = "gpt-4"

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if you didn't set as an env var>"))

Load dataset

We're using a public transaction dataset of transactions over £25k for the Library of Scotland. The dataset has three features that we'll be using:

Supplier: The name of the supplier
Description: A text description of the transaction
Value: The value of the transaction in GBP

Source:

https://data.nls.uk/data/organisational-data/transactions-over-25k/

transactions = pd.read_csv('./data/25000_spend_dataset_current.csv', encoding= 'unicode_escape')
len(transactions)

transactions.head()

	Date	Supplier	Description	Transaction value (£)
0	21/04/2016	M & J Ballantyne Ltd	George IV Bridge Work	35098.0
1	26/04/2016	Private Sale	Literary & Archival Items	30000.0
2	30/04/2016	City Of Edinburgh Council	Non Domestic Rates	40800.0
3	09/05/2016	Computacenter Uk	Kelvin Hall	72835.0
4	09/05/2016	John Graham Construction Ltd	Causewayside Refurbishment	64361.0

def request_completion(prompt):

    completion_response = openai.chat.completions.create(
                            prompt=prompt,
                            temperature=0,
                            max_tokens=5,
                            top_p=1,
                            frequency_penalty=0,
                            presence_penalty=0,
                            model=COMPLETIONS_MODEL)

    return completion_response

def classify_transaction(transaction,prompt):

    prompt = prompt.replace('SUPPLIER_NAME',transaction['Supplier'])
    prompt = prompt.replace('DESCRIPTION_TEXT',transaction['Description'])
    prompt = prompt.replace('TRANSACTION_VALUE',str(transaction['Transaction value (£)']))

    classification = request_completion(prompt).choices[0].message.content.replace('\n','')

    return classification

# This function takes your training and validation outputs from the prepare_data function of the Finetuning API, and
# confirms that each have the same number of classes.
# If they do not have the same number of classes the fine-tune will fail and return an error

def check_finetune_classes(train_file,valid_file):

    train_classes = set()
    valid_classes = set()
    with open(train_file, 'r') as json_file:
        json_list = list(json_file)
        print(len(json_list))

    for json_str in json_list:
        result = json.loads(json_str)
        train_classes.add(result['completion'])
        #print(f"result: {result['completion']}")
        #print(isinstance(result, dict))

    with open(valid_file, 'r') as json_file:
        json_list = list(json_file)
        print(len(json_list))

    for json_str in json_list:
        result = json.loads(json_str)
        valid_classes.add(result['completion'])
        #print(f"result: {result['completion']}")
        #print(isinstance(result, dict))

    if len(train_classes) == len(valid_classes):
        print('All good')

    else:
        print('Classes do not match, please prepare data again')

Zero-shot Classification

We'll first assess the performance of the base models at classifying these transactions using a simple prompt. We'll provide the model with 5 categories and a catch-all of "Could not classify" for ones that it cannot place.

zero_shot_prompt = '''You are a data expert working for the National Library of Scotland.
You are analysing all transactions over £25,000 in value and classifying them into one of five categories.
The five categories are Building Improvement, Literature & Archive, Utility Bills, Professional Services and Software/IT.
If you can't tell what it is, say Could not classify

Transaction:

Supplier: SUPPLIER_NAME
Description: DESCRIPTION_TEXT
Value: TRANSACTION_VALUE

The classification is:'''

# Get a test transaction
transaction = transactions.iloc[0]

# Interpolate the values into the prompt
prompt = zero_shot_prompt.replace('SUPPLIER_NAME',transaction['Supplier'])
prompt = prompt.replace('DESCRIPTION_TEXT',transaction['Description'])
prompt = prompt.replace('TRANSACTION_VALUE',str(transaction['Transaction value (£)']))

# Use our completion function to return a prediction
completion_response = request_completion(prompt)
print(completion_response.choices[0].text)

 Building Improvement

Our first attempt is correct, M & J Ballantyne Ltd are a house builder and the work they performed is indeed Building Improvement.

Lets expand the sample size to 25 and see how it performs, again with just a simple prompt to guide it

test_transactions = transactions.iloc[:25]
test_transactions['Classification'] = test_transactions.apply(lambda x: classify_transaction(x,zero_shot_prompt),axis=1)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

test_transactions['Classification'].value_counts()

 Building Improvement    14
 Could not classify       5
 Literature & Archive     3
 Software/IT              2
 Utility Bills            1
Name: Classification, dtype: int64

test_transactions.head(25)

	Date	Supplier	Description	Transaction value (£)	Classification
0	21/04/2016	M & J Ballantyne Ltd	George IV Bridge Work	35098.0	Building Improvement
1	26/04/2016	Private Sale	Literary & Archival Items	30000.0	Literature & Archive
2	30/04/2016	City Of Edinburgh Council	Non Domestic Rates	40800.0	Utility Bills
3	09/05/2016	Computacenter Uk	Kelvin Hall	72835.0	Software/IT
4	09/05/2016	John Graham Construction Ltd	Causewayside Refurbishment	64361.0	Building Improvement
5	09/05/2016	A McGillivray	Causewayside Refurbishment	53690.0	Building Improvement
6	16/05/2016	John Graham Construction Ltd	Causewayside Refurbishment	365344.0	Building Improvement
7	23/05/2016	Computacenter Uk	Kelvin Hall	26506.0	Software/IT
8	23/05/2016	ECG Facilities Service	Facilities Management Charge	32777.0	Building Improvement
9	23/05/2016	ECG Facilities Service	Facilities Management Charge	32777.0	Building Improvement
10	30/05/2016	ALDL	ALDL Charges	32317.0	Could not classify
11	10/06/2016	Wavetek Ltd	Kelvin Hall	87589.0	Could not classify
12	10/06/2016	John Graham Construction Ltd	Causewayside Refurbishment	381803.0	Building Improvement
13	28/06/2016	ECG Facilities Service	Facilities Management Charge	32832.0	Building Improvement
14	30/06/2016	Glasgow City Council	Kelvin Hall	1700000.0	Building Improvement
15	11/07/2016	Wavetek Ltd	Kelvin Hall	65692.0	Could not classify
16	11/07/2016	John Graham Construction Ltd	Causewayside Refurbishment	139845.0	Building Improvement
17	15/07/2016	Sotheby'S	Literary & Archival Items	28500.0	Literature & Archive
18	18/07/2016	Christies	Literary & Archival Items	33800.0	Literature & Archive
19	25/07/2016	A McGillivray	Causewayside Refurbishment	30113.0	Building Improvement
20	31/07/2016	ALDL	ALDL Charges	32317.0	Could not classify
21	08/08/2016	ECG Facilities Service	Facilities Management Charge	32795.0	Building Improvement
22	15/08/2016	Creative Video Productions Ltd	Kelvin Hall	26866.0	Could not classify
23	15/08/2016	John Graham Construction Ltd	Causewayside Refurbishment	196807.0	Building Improvement
24	24/08/2016	ECG Facilities Service	Facilities Management Charge	32795.0	Building Improvement

Initial results are pretty good even with no labelled examples! The ones that it could not classify were tougher cases with few clues as to their topic, but maybe if we clean up the labelled dataset to give more examples we can get better performance.

Classification with Embeddings

Lets create embeddings from the small set that we've classified so far - we've made a set of labelled examples by running the zero-shot classifier on 101 transactions from our dataset and manually correcting the 15 Could not classify results that we got

Create embeddings

This initial section reuses the approach from the Get_embeddings_from_dataset Notebook to create embeddings from a combined field concatenating all of our features

df = pd.read_csv('./data/labelled_transactions.csv')
df.head()

	Date	Supplier	Description	Transaction value (£)	Classification
0	15/08/2016	Creative Video Productions Ltd	Kelvin Hall	26866	Other
1	29/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	74806	Building Improvement
2	29/05/2017	Morris & Spottiswood Ltd	George IV Bridge Work	56448	Building Improvement
3	31/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	164691	Building Improvement
4	24/07/2017	John Graham Construction Ltd	Causewayside Refurbishment	27926	Building Improvement

df['combined'] = "Supplier: " + df['Supplier'].str.strip() + "; Description: " + df['Description'].str.strip() + "; Value: " + str(df['Transaction value (£)']).strip()
df.head(2)

	Date	Supplier	Description	Transaction value (£)	Classification	combined
0	15/08/2016	Creative Video Productions Ltd	Kelvin Hall	26866	Other	Supplier: Creative Video Productions Ltd; Desc...
1	29/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	74806	Building Improvement	Supplier: John Graham Construction Ltd; Descri...

from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

df['n_tokens'] = df.combined.apply(lambda x: len(tokenizer.encode(x)))
len(df)

embedding_path = './data/transactions_with_embeddings_100.csv'

from utils.embeddings_utils import get_embedding

df['babbage_similarity'] = df.combined.apply(lambda x: get_embedding(x, model='gpt-4'))
df['babbage_search'] = df.combined.apply(lambda x: get_embedding(x, model='gpt-4'))
df.to_csv(embedding_path)

Use embeddings for classification

Now that we have our embeddings, let see if classifying these into the categories we've named gives us any more success.

For this we'll use a template from the Classification_using_embeddings notebook

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from ast import literal_eval

fs_df = pd.read_csv(embedding_path)
fs_df["babbage_similarity"] = fs_df.babbage_similarity.apply(literal_eval).apply(np.array)
fs_df.head()

	Unnamed: 0	Date	Supplier	Description	Transaction value (£)	Classification	combined	n_tokens	babbage_similarity	babbage_search
0	0	15/08/2016	Creative Video Productions Ltd	Kelvin Hall	26866	Other	Supplier: Creative Video Productions Ltd; Desc...	136	[-0.009802100248634815, 0.022551486268639565, ...	[-0.00232666521333158, 0.019198870286345482, 0...
1	1	29/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	74806	Building Improvement	Supplier: John Graham Construction Ltd; Descri...	140	[-0.009065819904208183, 0.012094118632376194, ...	[0.005169447045773268, 0.00473341578617692, -0...
2	2	29/05/2017	Morris & Spottiswood Ltd	George IV Bridge Work	56448	Building Improvement	Supplier: Morris & Spottiswood Ltd; Descriptio...	141	[-0.009000026620924473, 0.02405017428100109, -...	[0.0028343256562948227, 0.021166473627090454, ...
3	3	31/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	164691	Building Improvement	Supplier: John Graham Construction Ltd; Descri...	140	[-0.009065819904208183, 0.012094118632376194, ...	[0.005169447045773268, 0.00473341578617692, -0...
4	4	24/07/2017	John Graham Construction Ltd	Causewayside Refurbishment	27926	Building Improvement	Supplier: John Graham Construction Ltd; Descri...	140	[-0.009065819904208183, 0.012094118632376194, ...	[0.005169447045773268, 0.00473341578617692, -0...

X_train, X_test, y_train, y_test = train_test_split(
    list(fs_df.babbage_similarity.values), fs_df.Classification, test_size=0.2, random_state=42
)

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
probas = clf.predict_proba(X_test)

report = classification_report(y_test, preds)
print(report)

                      precision    recall  f1-score   support

Building Improvement       0.92      1.00      0.96        11
Literature & Archive       1.00      1.00      1.00         3
               Other       0.00      0.00      0.00         1
         Software/IT       1.00      1.00      1.00         1
       Utility Bills       1.00      1.00      1.00         5

            accuracy                           0.95        21
           macro avg       0.78      0.80      0.79        21
        weighted avg       0.91      0.95      0.93        21

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Performance for this model is pretty strong, so creating embeddings and using even a simpler classifier looks like an effective approach as well, with the zero-shot classifier helping us do the initial classification of the unlabelled dataset.

Lets take it one step further and see if a fine-tuned model trained on this same labelled datasets gives us comparable results

Fine-tuned Transaction Classification

For this use case we're going to try to improve on the few-shot classification from above by training a fine-tuned model on the same labelled set of 101 transactions and applying this fine-tuned model on group of unseen transactions

Building Fine-tuned Classifier

We'll need to do some data prep first to get our data ready. This will take the following steps:

First we'll list out our classes and replace them with numeric identifiers. Making the model predict a single token rather than multiple consecutive ones like 'Building Improvement' should give us better results
We also need to add a common prefix and suffix to each example to aid the model in making predictions - in our case our text is already started with 'Supplier' and we'll add a suffix of '\n\n###\n\n'
Lastly we'll aid a leading whitespace onto each of our target classes for classification, again to aid the model

ft_prep_df = fs_df.copy()
len(ft_prep_df)

ft_prep_df.head()

	Unnamed: 0	Date	Supplier	Description	Transaction value (£)	Classification	combined	n_tokens	babbage_similarity	babbage_search
0	0	15/08/2016	Creative Video Productions Ltd	Kelvin Hall	26866	Other	Supplier: Creative Video Productions Ltd; Desc...	12	[-0.009630300104618073, 0.009887108579277992, ...	[-0.008217384107410908, 0.025170527398586273, ...
1	1	29/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	74806	Building Improvement	Supplier: John Graham Construction Ltd; Descri...	16	[-0.006144719664007425, -0.0018709596479311585...	[-0.007424891460686922, 0.008475713431835175, ...
2	2	29/05/2017	Morris & Spottiswood Ltd	George IV Bridge Work	56448	Building Improvement	Supplier: Morris & Spottiswood Ltd; Descriptio...	17	[-0.005225738976150751, 0.015156379900872707, ...	[-0.007611643522977829, 0.030322374776005745, ...
3	3	31/05/2017	John Graham Construction Ltd	Causewayside Refurbishment	164691	Building Improvement	Supplier: John Graham Construction Ltd; Descri...	16	[-0.006144719664007425, -0.0018709596479311585...	[-0.007424891460686922, 0.008475713431835175, ...
4	4	24/07/2017	John Graham Construction Ltd	Causewayside Refurbishment	27926	Building Improvement	Supplier: John Graham Construction Ltd; Descri...	16	[-0.006144719664007425, -0.0018709596479311585...	[-0.007424891460686922, 0.008475713431835175, ...

classes = list(set(ft_prep_df['Classification']))
class_df = pd.DataFrame(classes).reset_index()
class_df.columns = ['class_id','class']
class_df  , len(class_df)

(   class_id                 class
 0         0  Literature & Archive
 1         1         Utility Bills
 2         2  Building Improvement
 3         3           Software/IT
 4         4                 Other,
 5)

ft_df_with_class = ft_prep_df.merge(class_df,left_on='Classification',right_on='class',how='inner')

# Adding a leading whitespace onto each completion to help the model
ft_df_with_class['class_id'] = ft_df_with_class.apply(lambda x: ' ' + str(x['class_id']),axis=1)
ft_df_with_class = ft_df_with_class.drop('class', axis=1)

# Adding a common separator onto the end of each prompt so the model knows when a prompt is terminating
ft_df_with_class['prompt'] = ft_df_with_class.apply(lambda x: x['combined'] + '\n\n###\n\n',axis=1)
ft_df_with_class.head()

	Unnamed: 0	Date	Supplier	Description	Transaction value (£)	Classification	combined	n_tokens	babbage_similarity	babbage_search	class_id	prompt
0	0	15/08/2016	Creative Video Productions Ltd	Kelvin Hall	26866	Other	Supplier: Creative Video Productions Ltd; Desc...	12	[-0.009630300104618073, 0.009887108579277992, ...	[-0.008217384107410908, 0.025170527398586273, ...	4	Supplier: Creative Video Productions Ltd; Desc...
1	51	31/03/2017	NLS Foundation	Grant Payment	177500	Other	Supplier: NLS Foundation; Description: Grant P...	11	[-0.022305507212877274, 0.008543581701815128, ...	[-0.020519884303212166, 0.01993306167423725, -...	4	Supplier: NLS Foundation; Description: Grant P...
2	70	26/06/2017	British Library	Legal Deposit Services	50056	Other	Supplier: British Library; Description: Legal ...	11	[-0.01019938476383686, 0.015277703292667866, -...	[-0.01843327097594738, 0.03343546763062477, -0...	4	Supplier: British Library; Description: Legal ...
3	71	24/07/2017	ALDL	Legal Deposit Services	27067	Other	Supplier: ALDL; Description: Legal Deposit Ser...	11	[-0.008471488021314144, 0.004098685923963785, ...	[-0.012966590002179146, 0.01299362163990736, 0...	4	Supplier: ALDL; Description: Legal Deposit Ser...
4	100	24/07/2017	AM Phillip	Vehicle Purchase	26604	Other	Supplier: AM Phillip; Description: Vehicle Pur...	10	[-0.003459023078903556, 0.004626389592885971, ...	[-0.0010945454705506563, 0.008626140654087067,...	4	Supplier: AM Phillip; Description: Vehicle Pur...

# This step is unnecessary if you have a number of observations in each class
# In our case we don't, so we shuffle the data to give us a better chance of getting equal classes in our train and validation sets
# Our fine-tuned model will error if we have less classes in the validation set, so this is a necessary step

import random

labels = [x for x in ft_df_with_class['class_id']]
text = [x for x in ft_df_with_class['prompt']]
ft_df = pd.DataFrame(zip(text, labels), columns = ['prompt','class_id']) #[:300]
ft_df.columns = ['prompt','completion']
ft_df['ordering'] = ft_df.apply(lambda x: random.randint(0,len(ft_df)), axis = 1)
ft_df.set_index('ordering',inplace=True)
ft_df_sorted = ft_df.sort_index(ascending=True)
ft_df_sorted.head()

	prompt	completion
ordering
0	Supplier: Sothebys; Description: Literary & Ar...	0
1	Supplier: Sotheby'S; Description: Literary & A...	0
2	Supplier: City Of Edinburgh Council; Descripti...	1
2	Supplier: John Graham Construction Ltd; Descri...	2
3	Supplier: John Graham Construction Ltd; Descri...	2

# This step is to remove any existing files if we've already produced training/validation sets for this classifier
#!rm transactions_grouped*

# We output our shuffled dataframe to a .jsonl file and run the prepare_data function to get us our input files
ft_df_sorted.to_json("transactions_grouped.jsonl", orient='records', lines=True)
!openai tools fine_tunes.prepare_data -f transactions_grouped.jsonl -q

# This functions checks that your classes all appear in both prepared files
# If they don't, the fine-tuned model creation will fail
check_finetune_classes('transactions_grouped_prepared_train.jsonl','transactions_grouped_prepared_valid.jsonl')

31
8
All good

# This step creates your model
!openai api fine_tunes.create -t "transactions_grouped_prepared_train.jsonl" -v "transactions_grouped_prepared_valid.jsonl" --compute_classification_metrics --classification_n_classes 5 -m curie

# You can use following command to get fine tuning job status and model name, replace the job name with your job
#!openai api fine_tunes.get -i ft-YBIc01t4hxYBC7I5qhRF3Qdx

# Congrats, you've got a fine-tuned model!
# Copy/paste the name provided into the variable below and we'll take it for a spin
fine_tuned_model = 'curie:ft-personal-2022-10-20-10-42-56'

Applying Fine-tuned Classifier

Now we'll apply our classifier to see how it performs. We only had 31 unique observations in our training set and 8 in our validation set, so lets see how the performance is

test_set = pd.read_json('transactions_grouped_prepared_valid.jsonl', lines=True)
test_set.head()

	prompt	completion
0	Supplier: Wavetek Ltd; Description: Kelvin Hal...	2
1	Supplier: ECG Facilities Service; Description:...	1
2	Supplier: M & J Ballantyne Ltd; Description: G...	2
3	Supplier: Private Sale; Description: Literary ...	0
4	Supplier: Ex Libris; Description: IT equipment...	3

test_set['predicted_class'] = test_set.apply(lambda x: openai.chat.completions.create(model=fine_tuned_model, prompt=x['prompt'], max_tokens=1, temperature=0, logprobs=5),axis=1)
test_set['pred'] = test_set.apply(lambda x : x['predicted_class']['choices'][0]['text'],axis=1)

test_set['result'] = test_set.apply(lambda x: str(x['pred']).strip() == str(x['completion']).strip(), axis = 1)

test_set['result'].value_counts()

True     4
False    4
Name: result, dtype: int64

Performance is not great - unfortunately this is expected. With only a few examples of each class, the above approach with embeddings and a traditional classifier worked better.

A fine-tuned model works best with a great number of labelled observations. If we had a few hundred or thousand we may get better results, but lets do one last test on a holdout set to confirm that it doesn't generalise well to a new set of observations

holdout_df = transactions.copy().iloc[101:]
holdout_df.head()

	Date	Supplier	Description	Transaction value (£)
101	23/10/2017	City Building LLP	Causewayside Refurbishment	53147.0
102	30/10/2017	ECG Facilities Service	Facilities Management Charge	35758.0
103	30/10/2017	ECG Facilities Service	Facilities Management Charge	35758.0
104	06/11/2017	John Graham Construction Ltd	Causewayside Refurbishment	134208.0
105	06/11/2017	ALDL	Legal Deposit Services	27067.0

holdout_df['combined'] = "Supplier: " + holdout_df['Supplier'].str.strip() + "; Description: " + holdout_df['Description'].str.strip() + '\n\n###\n\n' # + "; Value: " + str(df['Transaction value (£)']).strip()
holdout_df['prediction_result'] = holdout_df.apply(lambda x: openai.chat.completions.create(model=fine_tuned_model, prompt=x['combined'], max_tokens=1, temperature=0, logprobs=5),axis=1)
holdout_df['pred'] = holdout_df.apply(lambda x : x['prediction_result']['choices'][0]['text'],axis=1)

holdout_df.head(10)

	Date	Supplier	Description	Transaction value (£)	combined	prediction_result	pred
101	23/10/2017	City Building LLP	Causewayside Refurbishment	53147.0	Supplier: City Building LLP; Description: Caus...	{'id': 'cmpl-63YDadbYLo8xKsGY2vReOFCMgTOvG', '...	2
102	30/10/2017	ECG Facilities Service	Facilities Management Charge	35758.0	Supplier: ECG Facilities Service; Description:...	{'id': 'cmpl-63YDbNK1D7UikDc3xi5ATihg5kQEt', '...	2
103	30/10/2017	ECG Facilities Service	Facilities Management Charge	35758.0	Supplier: ECG Facilities Service; Description:...	{'id': 'cmpl-63YDbwfiHjkjMWsfTKNt6naeqPzOe', '...	2
104	06/11/2017	John Graham Construction Ltd	Causewayside Refurbishment	134208.0	Supplier: John Graham Construction Ltd; Descri...	{'id': 'cmpl-63YDbWAndtsRqPTi2ZHZtPodZvOwr', '...	2
105	06/11/2017	ALDL	Legal Deposit Services	27067.0	Supplier: ALDL; Description: Legal Deposit Ser...	{'id': 'cmpl-63YDbDu7WM3svYWsRAMdDUKtSFDBu', '...	2
106	27/11/2017	Maggs Bros Ltd	Literary & Archival Items	26500.0	Supplier: Maggs Bros Ltd; Description: Literar...	{'id': 'cmpl-63YDbxNNI8ZH5CJJNxQ0IF9Zf925C', '...	0
107	30/11/2017	Glasgow City Council	Kelvin Hall	42345.0	Supplier: Glasgow City Council; Description: K...	{'id': 'cmpl-63YDb8R1FWu4bjwM2xE775rouwneV', '...	2
108	11/12/2017	ECG Facilities Service	Facilities Management Charge	35758.0	Supplier: ECG Facilities Service; Description:...	{'id': 'cmpl-63YDcAPsp37WhbPs9kwfUX0kBk7Hv', '...	2
109	11/12/2017	John Graham Construction Ltd	Causewayside Refurbishment	159275.0	Supplier: John Graham Construction Ltd; Descri...	{'id': 'cmpl-63YDcML2welrC3wF0nuKgcNmVu1oQ', '...	2
110	08/01/2018	ECG Facilities Service	Facilities Management Charge	35758.0	Supplier: ECG Facilities Service; Description:...	{'id': 'cmpl-63YDc95SSdOHnIliFB2cjMEEm7Z2u', '...	2

holdout_df['pred'].value_counts()

 2    231
 0     27
Name: pred, dtype: int64

Well those results were similarly underwhelming - so we've learned that with a dataset with a small number of labelled observations, either zero-shot classification or traditional classification with embeddings return better results than a fine-tuned model.

A fine-tuned model is still a great tool, but is more effective when you have a larger number of labelled examples for each class that you're looking to classify