Steve Comparison of LLMs – Claire's Blog

About the author

這篇文章是由我的實習生Steve Wang所撰寫

More about Steve Wang: https://renickbell.net/students/steve-wang/doku.php?id=start

Introduction

Here the author performs a comparison of different LLM AI models. The goal is to find the best LLM that can understand and reply in Traditional Chinese. A requirement is that this LLM should serve those physically and mentally handicapped or disadvantaged groups and help them understand how to apply for government welfare resources in Taiwan.This is important because most of the time, government websites often use lawyer speak. The categories under analysis will be speed of reply, reasoning ability, and clarity in traditional Chinese. Some models will be run on device and others in the cloud. The ones that will be run in the cloud include gpt4o (OpenAI) and Gemini 1.5 flash (Google). The ones run locally include Mistral and Llama3.2 will be run using Ollama on a laptop with an NVIDIA GeForce RTX 3070 Ti Laptop GPU.

Due to the fact that some models are run locally and others in the cloud, the time measurements would cause inconsistencies in measurements and thus will not be considered in comparisons made. All these tests will be run in the Dify interface. The device running Dify will be the same device running the models with Ollama. To reduce network latency, the same laptop was hooked up to wired internet which had an average of 600.44 mbps download and 51.26 mbps upload on the browser version of Speedtest.

Research Methods

The testing process was as follows. A basic Dify workflow will be created with a startpoint leading to one of the models. Process as follows:

The LLM was prompted like so:

You are a chatbot designed to help serve some physically and mentally handicapped or disadvantaged groups and help them understand how to apply for government welfare resources in Taiwan. Answer like a person from Taiwan and in traditional Chinese. Remember the person you are speaking with is most likely from Taiwan so respond accordingly.

  

here is the question from user: {question from user}

The first question prompted by the user was “我該如何申請低收入戶補助？”. The follow up question “告訴我臺灣臺北市的低收入資格”. Following that: “根據你上面提供的資料. 我在臺北市住. 在臺北市有租一個房間. 月收13,455元我可以申請嗎?”. It was believed that these questions would test if the LLM would reply in full traditional Chinese, give a relatively good idea of the speed of the LLM, and test if the LLM has good reasoning skills. When calculating the time, only the time necessary for the LLM to produce a response was calculated. Shown in image below (circled in red). These prompts were designed in this way to simulate a generally authentic, real-life usage case, not a clinical scientific study.

The accuracy of the responses were not judged because of possible hallucinations and the fact that the actual correct information would be provided to the system in a real world use case. However, the accuracy of answers based on the information that each LLM came up with was judged to test reasoning ability. In other words, we treated all responses produced as fact for each testing scenario. The “facts” each LLM came up with were used to judge their following responses.

Results

LLama3.2 (Meta)

llama3.2 took 9.883 s, 2.919 s, and 2.419 s respectively to answer each question. All of the responses are in traditional Chinese but there are a few glitches observed. For instance, “住住的人” that can be found in the responses of question 2 and 3. Other than that, the answers seemed to be fine, the logic and reasoning sound. Llama also sounded quite professional. This model appears to suit the goal quite well. View full response from LLama3.2 in the Full Response from LLMs section below.

Mistral (Mistral AI)

Mistral took 12.312 s, 29.308 s, and 16.970 s respectively to answer each of the questions. There is some use of Simplified Chinese. For instance, “身份” in responses one and responses two and “证明” in responses one and two. It seems that the language is accurate and the logic is quite clear. Another thing of note is, Mistral always starts responses with: “您好！” which might make the conversation feel robotic because there is not variation.

Gemini 1.5 Flash (Google)

Google Gemini 1.5 Flash took 8.516 s, 11.995 s, and 9.561 s respectively to answer each question. All responses are in traditional Chinese and the language is quite clear and accurate. The logic and reasoning is sound. It seems that Gemini can follow the conversion quite well. An interesting thing to note is that Gemini’s tone is very friendly which could help people using it feel more comfortable. However, perhaps due to this, most of its answers follow a formulaic format. Gemini would mostly start with a greeting and end with a sentence of encouragement. As a non-native Chinese speaker, the writer is unsure how this would make people in Taiwan feel. All in all, Gemini is quite a good fit for the goals.

gpt4o-mini (OpenAI)

gpt40-mini took 3.594 s, 2.580 s, and 2.488 s respectively to answer each question. There are some responses in Simplified Chinese. For instance, in the first response, gpt4o-mini uses “台灣” instead of “臺灣”. However it would seem that this is acceptable to some people living in Taiwan. Other than that, the language is clear and accurate. The logic and reasoning is sound. The writer observes that gpt4o seems to be very careful and tries to not give specific answers until there is correct information. This point makes gpt4o a good model to achieve research goals.

Discussion

From the results, only Gemini 1.5 Flash, LLama3.2, and gpt40-mini could give all the responses in traditional Chinese. Both have acceptable speeds that would not make the user feel impatient. Compared to Llama3.2, Gemini 1.5 Flash had better Chinese skills and responded without making mistakes. Gemini 1.5 Flash was also quite comforting and encouraging in its tone which might make it ideal for communicating with physically and mentally handicapped or disadvantaged groups and help them understand how to apply for government welfare resources.

Conclusion

After testing all the LLMs to see which one fits our goals the best, Gemini 1.5 Flash and gpt40-mini seems to be the best choice at the moment. To further narrow down the best LLM for the task, tests could be done on bigger models like llama3 with 70b parameters (Llama3.2 has 3b). To reduce cost and the network speed of using Gemini models from Google, testing could be done on Gemma or Gemma2 these model are both available on Ollama and are from Google.

Full Responses from LLMs

正確答案

Q.我該如何申請低收入戶補助？

應備文件:(1)申請表1份（可自本局網站下載，[連結]）。(2)新式戶口名簿(含詳細記事)影本1份。(3)戶長郵局或市庫指定銀行(現為台北富邦銀行)存摺封面影本1份。(4)其他相關證明文件1份（如：身心障礙證明文件、診斷證明書、學生證、薪資證明、房屋所有權狀影本等）。

備妥後可至區公所臨櫃親自申辦、委託申辦、郵寄申辦、網路申辦（全程式）

Q.告訴我臺灣臺北市的低收入資格

設籍並實際居住本市、最近1年出境累計未超過183日，且同時符合以下條件者：1、家庭總收入平均分配全家人口，每人每月在最低生活費標準以下（本市113年度為19,649元）。2、全家人口之存款（含股票投資）平均每人不超過新臺幣15萬元。3、全家人口之土地及房屋價值不超過新臺幣793萬元。[參考資料]

Q.根據你上面提供的資料. 我在臺北市住. 在臺北市有租一個房間. 月收13,455元我可以申請嗎?

居住要求：你在台北市居住，且有租房子，符合居住條件。
收入要求：
– 你每月收入為13,455元。
– 2024年台北市公告的最低生活費標準為每人19,649元，因此，如果這是你的家庭的總收入，平均分配家庭人口後可用於申請低收入戶。
家庭人口：這個收入是針對你單獨的情況，還是指家庭的總收入？如果是家庭的總收入，那麼需要計算家庭中其他成員的收入。
– 如果你是單身，且家庭中只有你一個人，13,455元的月收入是低於19,649元，符合低收入戶的申請資格。
– 如果有其他家庭成員的收入，則需要將總收入平均分配並確認是否低於標準。

LLama3.2 (Meta)

Q1:

申請低收入戶補助，通常需要通過以下步驟：

登入公共住宅資訊網：首先，你需要登入公共住宅資訊網（http://www.taiwan.gov.tw/zh-hant/），並且點擊「申請服務」或是「低收入戶補助»，這將導向你到該平台的相關頁面。
填寫登記資料：在此頁面，你需要填寫你的個人資訊、家庭主管資訊、收入和出息資訊等等，以供政府進行查詢，並確認申請的正確性。
提供必要文件：你需要將相關文件upload至網站，包括：

身分證字樣或護照
戶口登記單
年度收入表單（含所屬機關、部門、職位和每月的收入）
家庭主管身份證字樣

申請完成：完成上述步驟後，系統將會提供你一個登記號，使用此編號可以查詢你的申請狀態。
等待評估結果：在您填寫資料和上传文件之后，您需要等待政府的評估結果。

如果您有任何問題或需要幫助，您可以聯繫公共住宅資訊網的客服。

Q2:

台北市政府對於低收入戶補助有以下規定：

所屬家庭人數：單位中，含有小孩者，少於6名成員；含有5名以上成員者，少於8名成員。
收入標準：每人月收入不超過 NT$34,000。
申請人資格：申請人需是台北市住住的人。
申請時機：申請時間為每年6月1日至12月31日（包含）。
續付期限：續付期限為每年2月1日至7月31日（含）。

Q3:

根據台北市政府對於低收入戶補助的規定：

每人月收入不超過 NT$34,000
申請人需是台北市住住的人
申請時機為每年6月1日至12月31日（包含）

您所租用房間的月收13,455元，符合每人月收入上限標準，且您在台北市住家。

因此，您可以申請台北市政府低收入戶補助。

Gemini 1.5 Flash (Google)

Q1:

您好！要申請低收入戶補助，需要先了解您的狀況，才能告訴您如何申請。

請您先告訴我：

您是哪個縣市或鄉鎮的居民？ 因為各縣市政府的申請方式和條件可能略有不同。
您和您的家庭成員有哪些人？ 包含年齡、職業、收入狀況等等。
您目前的收入和財產狀況如何？

有了這些資訊，我才能幫您更準確地解答如何申請低收入戶補助。

另外，建議您直接到戶籍所在地的 社會局 或 鄉鎮區公所 詢問，他們會提供最完整的資訊和協助。

加油！祝您順利申請到補助！

Q2:

好的，以下整理了台北市的低收入戶資格資訊：

台北市低收入戶資格

1. 戶籍: 必須設籍於台北市。

2. 家庭成員: 包含您本人和所有共同生活且有扶養關係的家人。

3. 收入:

每人每月收入需低於 新台幣 15,000 元。
計算收入時，會考慮家庭成員的 工作收入、投資收益、房租收入、退休金、補助款 等。

4. 財產:

家庭總財產價值需低於 新台幣 100 萬元。
財產計算包含 房屋、土地、存款、股票、債券、汽車、機車 等。

5. 其他條件:

若家庭成員中有身心障礙者，或有特殊情況，可能會依個案狀況調整資格標準。

如何申請:

您可以到戶籍所在地的 台北市社會局 或 區公所 索取申請表格。
填妥申請表格，並附上相關證明文件，例如 戶口名簿、收入證明、財產證明、身心障礙證明 等。
將申請資料送交社會局或區公所受理。

注意事項:

申請低收入戶資格需要經過審查，審查時間大約需要 1 個月。
若審查結果未符合資格，可以提出申覆。

建議:

為了更清楚了解自己的情況是否符合資格，建議您直接到社會局或區公所諮詢，他們會提供更詳細的資訊和協助。

希望以上資訊對您有所幫助！

Q3:

根據您提供的資訊，您在台北市居住，月收入 13,455 元，若您是單身，且沒有其他財產，您可能符合台北市低收入戶資格的條件。

但請注意，以下幾點需要確認：

家庭成員: 您是單身嗎？還是有其他家人一起居住？如果有家人，他們的收入和財產狀況也要納入考量。
財產: 除了租屋以外，您還有其他財產嗎？例如存款、股票、債券、汽車、機車等等。
其他收入: 除了工作收入，您還有其他收入嗎？例如房租收入、補助款等等。

建議您直接到戶籍所在地的 台北市社會局 或 區公所 詢問，他們會提供更詳細的資訊和協助，幫您確認是否符合資格。

您可以攜帶您的 身分證、戶口名簿、收入證明 等相關文件，到社會局或區公所諮詢。

祝您順利申請到補助！

gpt4o-mini

Q1:

要申請低收入戶補助，您需要按照以下步驟進行：

確認資格：首先，您需要確認自己是否符合低收入戶的條件。在台灣，低收入戶的標準是根據家庭的收入及財產狀況來判定的。
準備文件：您需要準備一些必要的文件，包括但不限於：

身分證明文件（如身分證或戶口名簿）
收入證明（如薪資單、稅單或社會福利補助的證明）
財產證明（如銀行存款證明或不動產證明）

填寫申請表：前往當地的社會局或民政局，索取並填寫低收入戶的申請表格。
提交申請：將填妥的申請表及相關文件提交至您所屬的社會局或民政局。
等待審核：申請後，相關單位會進行審核，通常會在一段時間內通知您結果。

如果您有任何疑問或需要協助，建議您可以直接聯繫當地的社會局，他們會提供您更詳細的資訊和幫助。希望這些資訊對您有所幫助！

Q2:

在臺北市，低收入戶的資格主要是根據家庭的收入和財產狀況來判定的。以下是一些基本的條件：

收入標準：低收入戶的年收入必須低於台北市所定的低收入戶標準。這個標準會根據家庭人數有所不同，通常每年會調整一次，您可以查詢當前的標準。
財產限制：家庭的總財產（如存款、不動產等）也不能超過一定的限額。這個限額同樣會根據家庭人數有所不同。
家庭成員：申請時需要提供家庭成員的相關資料，包括配偶、子女等，這會影響您的收入和財產計算。

建議您可以前往臺北市政府社會局的官方網站，查詢最新的低收入戶資格標準和相關資訊，或直接聯繫當地社會局以獲得更詳細的說明。希望這些資訊對您有幫助！

Q3:

根據您提供的資訊，您的月收入為13,455元。要判斷您是否可以申請低收入戶補助，還需要考慮以下幾點：

年收入計算：您的年收入約為161,460元（13,455元 x 12個月）。您需要確認這個收入是否低於臺北市所定的低收入戶年收入標準。
家庭成員：如果您有其他家庭成員（如配偶或子女），那麼他們的收入也需要合併計算，這可能會影響您的低收入戶資格。
財產狀況：除了收入，您的財產狀況也會影響申請資格，需確認總財產是否在低收入戶的限制範圍內。

建議您可以查詢臺北市政府社會局的網站，了解目前的低收入戶收入標準和財產限制，或直接聯絡社會局詢問，這樣可以獲得更準確的資訊和指導。希望這些資訊對您有幫助！

Mistral (Mistral AI)

Q1:

您好！您要如何申請低收入戶補助？請先準備以下文件：身份證字號、戶籍證明文件、家庭成員名單（每個成員的身分證字號）、近六個月的工資說明書或其他收入來源证明，以及金融銀行發卡的帳戶報表。接下來，您可以到網上申請平台上填寫申請表格，或到鄉鎮市政府的社會福利辦公室提出申請。如果您需要任何其他說明，請聯繫我們即時回答。

Q2:

您好！台北市的低收入資格是根據家庭每月實際消費經銷額的標準，分為三級：第一級是每月實際消費經銷額不超過10,800元（家庭成員數量上限5人）、第二級是每月實際消費經銷額為10,800元至27,000元之間（家庭成員數量上限5人）、第三級是每月實際消費經銷額超過27,000元的家庭。如果您想要申請台北市低收入補助，請準備以下文件：身份證字號、戶籍證明文件、家庭成員名單（每個成員的身分證字號）、近六個月的工資說明書或其他收入來源证明，以及金融銀行發卡的帳戶報表。接下來，您可以到網上申請平台上填寫申請表格，或到鄉鎮市政府的社會福利辦公室提出申請。如果您需要任何其他說明，請聯繫我們即時回答。

q3:

您好！根據台北市低收入補助的資格，每月實際消費經銷額上限為10,800元（家庭成員數量上限5人）。您每月僅支付13,455元，不符合低收入補助的資格。但是，請不要灰心！如果您需要任何其他社會福利資源，請聯繫我們即時回答或到鄉鎮市政府的社會福利辦公室提出申請。

About the author

Introduction

Research Methods

Results

LLama3.2 (Meta)

Mistral (Mistral AI)

Gemini 1.5 Flash (Google)

gpt4o-mini (OpenAI)

Discussion

Conclusion

Full Responses from LLMs

正確答案

Q.我該如何申請低收入戶補助？

Q.告訴我臺灣臺北市的低收入資格

Q.根據你上面提供的資料. 我在臺北市住. 在臺北市有租一個房間. 月收13,455元 我可以申請嗎?

LLama3.2 (Meta)

Q1:

Q2:

Q3:

Gemini 1.5 Flash (Google)

Q1:

Q2:

Q3:

gpt4o-mini

Q1:

Q2:

Q3:

Mistral (Mistral AI)

Q1:

Q2:

q3:

Q.根據你上面提供的資料. 我在臺北市住. 在臺北市有租一個房間. 月收13,455元我可以申請嗎?