<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/rss.xsl"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Anh Tran</title>
    <link>https://anhtran.education</link>
    <description>Anh Tran. Software engineer, NLP researcher, educator. Bay Area.</description>
    <language>en</language>
    <atom:link href="https://anhtran.education/rss.xml" rel="self" type="application/rss+xml" />
    <lastBuildDate>Wed, 22 Apr 2026 08:35:44 GMT</lastBuildDate>
    <item>
      <title>Hello, and why I&apos;m publishing here</title>
      <link>https://anhtran.education/writing/hello</link>
      <guid isPermaLink="true">https://anhtran.education/writing/hello</guid>
      <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>
      <description>A new home for research summaries, beginner AI tutorials, and half-formed thoughts.</description>
      <content:encoded><![CDATA[<p>I'm setting this site up as a single place to share three kinds of writing:</p>
<ol>
<li><strong>Paper Notes</strong>: brief summaries of our own research, plus occasional notes on papers I've found useful.</li>
<li><strong>Intro to AI</strong>: beginner-friendly AI and programming walkthroughs, drawing from my Laney course but also written fresh for readers new to the field.</li>
<li><strong>Occasional notes</strong>: short pieces on AI, coding, teaching, or whatever else catches my attention.</li>
</ol>
<p>The plan is bi-weekly. I'd rather ship steadily than burn out in a month.</p>
<p>If you subscribe to the <a href="/rss.xml">RSS feed</a>, you'll get everything in one place. No separate lists.</p>
]]></content:encoded>
      <category>post</category>
      <category>meta</category>
      <category>teaching</category>
    </item>
    <item>
      <title>Multimodal disaster-tweet classification: what I found</title>
      <link>https://anhtran.education/writing/multimodal-disaster-tweets</link>
      <guid isPermaLink="true">https://anhtran.education/writing/multimodal-disaster-tweets</guid>
      <pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
      <description>The ASONAM 2025 paper, without the eight pages of method.</description>
      <content:encoded><![CDATA[<h2>Summary</h2>
<p>Fine-tuned open-source multimodal LLMs can beat both the strongest proprietary zero-shot models (GPT-4o) and the prior best fine-tuned CLIP baselines on disaster-related tweet classification. On CrisisMMD, LLaMA 3.2 11B with LoRA reaches state-of-the-art F1 by training <strong>less than 1% of the model's ~11B parameters</strong>. A fine-tuned 3B text-only model also outperforms the previous best on both tasks, which suggests a genuinely deployable option for edge devices.</p>
<h2>What we did</h2>
<p>We evaluated three multimodal LLMs (GPT-4o, GPT-4o mini, LLaMA 3.2 11B) on the CrisisMMD dataset under zero-shot, one-shot, and (for LLaMA) LoRA fine-tuned settings. The two tasks are:</p>
<ul>
<li><strong>Informativeness</strong>: is this tweet usefully crisis-related, or not?</li>
<li><strong>Humanitarian category</strong>: what kind of information is in it (affected individuals, infrastructure damage, rescue efforts, etc.)?</li>
</ul>
<p>CrisisMMD spans seven real 2017 events including California Wildfires, Hurricanes Harvey/Irma/Maria, and the Mexico earthquake.</p>
<h2>Main findings</h2>
<ol>
<li><strong>Zero-shot multimodal LLMs are already solid, and GPT-4o mini wins the cost/quality tradeoff.</strong> Across most zero-shot settings, GPT-4o mini beat GPT-4o while being significantly cheaper. With prompt engineering, GPT-4o's informativeness F1 on text+image rose from 76.90 (a prior report) to 87.71.</li>
<li><strong>One-shot and five-shot prompting did not consistently help.</strong> For LLaMA 3.2 11B specifically, one-shot with multiple images actually <em>hurt</em> performance. That aligns with a known limitation the Meta LLaMA team has flagged: the 11B vision model is not reliable with multiple images at inference time.</li>
<li><strong>LoRA fine-tuning is the hero result.</strong> LLaMA 3.2 11B + LoRA reached F1 94.77 on informativeness (text+image) and 91.62 on humanitarian (text+image), surpassing both the CLIP baseline (93.13 and 90.04) and every zero-shot model tested. LoRA touched less than 1% of the ~11B parameters.</li>
<li><strong>Small text-only models can punch above their weight.</strong> LLaMA 3.2 3B (text-only) fine-tuned with LoRA beat the CLIP baseline on both tasks (91.73 vs 85.99 on informativeness, 83.66 vs 80.70 on humanitarian). Notably, full fine-tuning of the 1B model gave <em>worse</em> results than its LoRA-tuned counterpart.</li>
</ol>
<h2>Why it matters</h2>
<p>Disaster-response teams need to triage social-media posts fast and cheaply during an event. The combination of (a) open-source models, (b) parameter-efficient fine-tuning that runs on modest hardware, and (c) accuracy that matches or exceeds the previous best is a deployable package, not just a benchmark delta. For agencies that can't route every tweet through a third-party API, a LoRA-tuned LLaMA running on their own infrastructure is a practical path.</p>
<h2>My take</h2>
<p>The result that stuck with me isn't the 11B headline number. It's that a 3B text-only model, fine-tuned with LoRA, beats the fine-tuned CLIP baseline. Edge-device inference for crisis triage stops being aspirational when the model weights fit in a few GB and the accuracy holds. The obvious next question is how robust these fine-tuned models are to disaster types <em>not</em> in CrisisMMD's seven events, because the real test of a deployable system is the one it hasn't seen before.</p>
<h2>Read the paper</h2>
<p><strong>Multimodal Disaster-Related Tweet Classification with Parameter-Efficient Fine-Tuning of Large Language Models.</strong> Guo, Tran, Xiao, Li, Caragea. ASONAM 2025 Proceedings, Springer, pp. 413-428.</p>
<p><a href="https://web.ntpu.edu.tw/~myday/doc/ASONAM2025/ASONAM2025_Proceedings/pdf/papers/1334_084.pdf">pdf</a> · <a href="https://github.com/deeplearning-lab-csueb/Fine-tune-Multimodal-LLM-for-CrisisMMD">code</a></p>
]]></content:encoded>
      <category>paper-summary</category>
      <category>nlp</category>
      <category>llm</category>
      <category>multimodal</category>
      <category>research</category>
    </item>
  </channel>
</rss>