publications

2025

  1. MeCo_cover.png
    Metadata Conditioning Accelerates Language Model Pre-training
    Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi , and 1 more author
    Preprint, 2025

2024

  1. benign_data_safety.png
    What is in Your Safe Data? Identifying Benign Data that Breaks Safety
    Luxi He*, Mengzhou Xia*, and Peter Henderson
    Conference on Language Modeling (COLM), ICLR Data Problems in Foundation Model (Best Paper), 2024
  2. copycat_cover.png
    Fantastic Copyrighted Beasts and How (Not) to Generate Them
    Luxi He*, Yangsibo Huang*, Weijia Shi*, Tinghao Xie, Haotian Liu , and 5 more authors
    ICLR 2025, ICML GenLaw (Spotlight), 2024
  3. charxiv_cover.png
    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
    Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu , and 8 more authors
    NeurIPS Datasets & Benchmarks, 2024

2023

  1. fairfront_cover.png
    Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions
    Hao Wang, Luxi He, Rui Gao, and Flavio Calmon
    In NeurIPS (Spotlight) , 2023