Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Fintech Fetch
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Fintech Fetch
    Home»AI News»How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
    How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
    AI News

    How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

    June 2, 20263 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    murf

    rewrite this content and keep HTML tags as is. This is content from rss feed and I don’t need their *Daily Debrief Newsletter*, their tags from bottom like this *Share this articleCategoriesTags*, Editorial Process section, phrases like *Featured image from Peakpx, chart from Tradingview.com*, SPECIAL OFFERS and similar sections – just remove such sections and save only article itself:

    print(“\n### SECTION D: end-to-end Transformer (vanilla fp32 vs Apex fused + AMP) ###”)
    VOCAB, D, NHEAD, LAYERS, SEQ, BATCH, STEPS = 2000, 256, 4, 4, 128, 32, 60
    class Block(torch.nn.Module):
    def __init__(self, d, nhead, norm_cls):
    super().__init__()
    self.attn = torch.nn.MultiheadAttention(d, nhead, batch_first=True)
    self.ff = torch.nn.Sequential(torch.nn.Linear(d, 4 * d), torch.nn.GELU(),
    torch.nn.Linear(4 * d, d))
    self.n1, self.n2 = norm_cls(d), norm_cls(d)
    def forward(self, x):
    h = self.n1(x); x = x + self.attn(h, h, h, need_weights=False)[0]
    return x + self.ff(self.n2(x))
    class TinyTransformer(torch.nn.Module):
    def __init__(self, norm_cls):
    super().__init__()
    self.emb = torch.nn.Embedding(VOCAB, D)
    self.blocks = torch.nn.ModuleList([Block(D, NHEAD, norm_cls) for _ in range(LAYERS)])
    self.norm = norm_cls(D)
    self.head = torch.nn.Linear(D, VOCAB)
    def forward(self, idx):
    x = self.emb(idx)
    for b in self.blocks:
    x = b(x)
    return self.head(self.norm(x))
    g = torch.Generator(device=”cpu”).manual_seed(0)
    data = torch.randint(0, VOCAB, (BATCH, SEQ + 1), generator=g).to(DEV)
    inp, tgt = data[:, :-1], data[:, 1:]
    lossfn = torch.nn.CrossEntropyLoss()
    def run_training(use_apex):
    torch.manual_seed(0)
    norm_cls = (FusedLayerNorm if (use_apex and HAS_FLN and APEX_OK) else torch.nn.LayerNorm)
    model = TinyTransformer(norm_cls).to(DEV)
    if use_apex and HAS_AMP_C and APEX_OK:
    optimizer = FusedAdam(model.parameters(), lr=3e-4)
    else:
    optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
    scaler = torch.amp.GradScaler(“cuda”, enabled=use_apex)
    def one_step():
    optimizer.zero_grad(set_to_none=True)
    with torch.amp.autocast(“cuda”, dtype=torch.float16, enabled=use_apex):
    logits = model(inp)
    loss = lossfn(logits.reshape(-1, VOCAB), tgt.reshape(-1))
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    return loss
    for _ in range(5):
    one_step()
    torch.cuda.synchronize()
    t0 = time.perf_counter()
    for _ in range(STEPS):
    loss = one_step()
    torch.cuda.synchronize()
    dt = time.perf_counter() – t0
    return loss.item(), (STEPS * BATCH * SEQ) / dt, dt
    loss_v, tps_v, dt_v = run_training(use_apex=False)
    print(f” vanilla (fp32, nn.LayerNorm, AdamW) : ”
    f”{dt_v:5.2f}s | {tps_v:9.0f} tok/s | final loss {loss_v:.3f}”)
    if APEX_OK and (HAS_AMP_C or HAS_FLN):
    loss_a, tps_a, dt_a = run_training(use_apex=True)
    print(f” apex (fp16, FusedLayerNorm, FusedAdam) : ”
    f”{dt_a:5.2f}s | {tps_a:9.0f} tok/s | final loss {loss_a:.3f}”)
    print(f” —-> speedup: {tps_a / tps_v:0.2f}x throughput”)
    else:
    print(” apex path SKIPPED (no fused kernels built)”)
    print(“\n” + “=” * 78)
    print(“DONE. Key takeaways:”)
    print(” – FusedAdam/FusedLayerNorm/FusedRMSNorm are the still-relevant Apex pieces;”)
    print(” speedups grow with model size & parameter count (tiny demo understates it).”)
    print(” – apex.amp is deprecated -> prefer torch.amp.autocast + torch.amp.GradScaler.”)
    print(” – FusedAdam composes cleanly with native torch.amp (Section D).”)
    print(” – On real workloads, also try a larger model and bf16 autocast (no scaler needed).”)
    print(“=” * 78)
    frase
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Fintech Fetch Editorial Team
    • Website

    Related Posts

    MIT’s Initiative for New Manufacturing builds momentum | MIT News

    MIT’s Initiative for New Manufacturing builds momentum | MIT News

    June 17, 2026
    Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization

    Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization

    June 16, 2026
    Automating portfolio trading with AI

    Automating portfolio trading with AI

    June 15, 2026
    How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing

    How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing

    June 14, 2026
    Add A Comment

    Comments are closed.

    Join our email newsletter and get news & updates into your inbox for free.


    Privacy Policy

    Thanks! We sent confirmation message to your inbox.

    kraken
    Latest Posts
    MIT’s Initiative for New Manufacturing builds momentum | MIT News

    MIT’s Initiative for New Manufacturing builds momentum | MIT News

    June 17, 2026
    The Four Types of Memory Every AI Agent Needs

    The Four Types of Memory Every AI Agent Needs

    June 17, 2026
    Coinbase Launches 21 Products at Once, Including Bitcoin-Backed Mortgages and AI Advisor

    rewrite this title in other words: Coinbase Launches 21 Products at Once, Including Bitcoin-Backed Mortgages and AI Advisor

    June 16, 2026
    Charles Hoskinson Reveals What Happened to 1,096 BTC From Cardano's Early Days

    rewrite this title in other words: Charles Hoskinson Reveals What Happened to 1,096 BTC From Cardano’s Early Days

    June 16, 2026
    Oluwapelumi Adejumo

    rewrite this title in other words: Strategy bought $100 million more Bitcoin but critics say MSTR shareholders now own less of it

    June 16, 2026
    binance
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights
    bitcoin-drops-toward-80-000-market-misinterprets-white-house-summit-information

    rewrite this title in other words: Bitcoin Rallies To $67K As US-Iran Make Peace: Will Both Hold?

    June 17, 2026
    Kraken

    rewrite this title in other words: Kraken Launches CFTC-Regulated Perpetual Futures For US Pro Traders

    June 17, 2026
    aistudios
    Facebook X (Twitter) Instagram Pinterest
    © 2026 FintechFetch.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.