fix img format

Monstertail · Monstertail · commit beaba6acf780 · 2025-03-21T14:35:43.000-05:00
diff --git a/src/assets/publications/yao2024deft/deft.md b/src/assets/publications/yao2024deft/deft.md
@@ -1,6 +1,7 @@
 
 <div align="center">
-<img src="./deft.jpeg" alt="logo" width="200"></img>
+<img src="./deft.jpeg" width="200"
+/>
 </div>
 
 --------------------------------------------------------------------------------
@@ -14,6 +15,9 @@ We propose DeFT, an IO-aware attention algorithm for efficient tree-structured i
 Large language models (LLMs) are increasingly employed for complex tasks that process multiple generation calls in a tree structure with shared prefixes of tokens, including few-shot prompting, multi-step reasoning, speculative decoding, etc. However, existing inference systems for tree-based applications are inefficient due to improper partitioning of queries and KV cache during attention calculation.This leads to two main issues: (1) a lack of memory access (IO) reuse for KV cache of shared prefixes, and (2) poor load balancing. As a result, there is redundant KV cache IO between GPU global memory and shared memory, along with low GPU utilization. To address these challenges, we propose DeFT(Decoding with Flash Tree-Attention), a hardware-efficient attention algorithm with prefix-aware and load-balanced KV cache partitions. DeFT reduces the number of read/write operations of KV cache during attention calculation through KV-Guided Grouping, a method that avoids repeatedly loading KV cache of shared prefixes in attention computation. Additionally, we propose Flattened Tree KV Splitting, a mechanism that ensures even distribution of the KV cache across partitions with little computation redundancy, enhancing GPU utilization during attention computations. By reducing 73-99% KV cache IO and nearly 100% IO for partial results during attention calculation, DeFT achieves up to 2.23/3.59X speedup in the end-to-end/attention latency across three practical tree-based workloads compared to state-of-the-art attention algorithms.
 
 ## DeFT Overview
-<div align="center">
-<img src="./DeFT_overview.jpg" alt="overview" width="95%"></img>
-</div>
+
+
+
+<img src="./DeFT_overview.jpg" style="zoom:50%;" 
+/>
+