log in  |  register  |  feedback?  |  help  |  web accessibility
Synthetic Data for Self-Evolving AI
Wednesday, October 2, 2024, 11:00 am-12:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Data is the new oil for training large AI models. However, the "oil" generated by humans may run out someday or grow much slower than the speed of AI consuming them. Moreover, the human-created data are less controllable in terms of quality, opinions, format, style, etc., and may lead to biases or privacy concerns when used for model training. Can we leverage the power of Generative-AI and automatically create synthetic training data in a more efficient, controllable, and safe manner? Does continual learning on synthetic data lead to self-evolving AI? What risks should we expect? In this talk, I will present our recent works that aim to investigate whether and how synthetic data can be generated to improve large language models (LLMs) and computer vision models, especially when the real data is non-perfect. These works include Mosaic-IT (compositional data augmentation for instruction tuning), Selective Reflection Tuning (data generation via student-teacher interactions), DEBATunE (data generation by LLM debate), and Diffusion Curriculum (generative curriculum learning of images). These projects are led by my Ph.D. students Ming Li and Yijun Liang.

Bio

Dr. Tianyi Zhou is a tenure-track assistant professor of Computer Science at the University of Maryland, College Park (UMD). He received his Ph.D. from the University of Washington and worked as a research scientist at Google before joining UMD. His research interests are machine learning, natural language processing, and multi-modal generative AI. His team has published >100 papers in ML (NeurIPS, ICML, ICLR), NLP (ACL, EMNLP, NAACL), and journals such as IEEE TPAMI/TIP/TNNLS/TKDE, with >6700 citations. His recent research topics are (1) Human-AI hybrid intelligence (humans help AI, AI helps humans, human-AI teaming); (2) Controllable multi-modal generative AI via post-training and prompting; (3) Synthetic data, self-evolving AI, and auto-benchmarking; (4) Neurosymbolic world models and System-2 embodied agents.

This talk is organized by Naomi Feldman