Danny To Eun Kim
김도은
toeunkim{at}cmu{dot}edu

(View from wider screen to see with pictures!)

I'm a first year PhD student at CMU Language Technologies Institute (LTI), advised by Prof. Fernando Diaz. My research interests span the fields of Natural Language Processing (NLP) and Information Retrieval (IR), with recent emphasis on retrieval-enhanced machine learning (REML) and known-item retrieval. I like to draw inspiration from a broader context, including areas like psychology, cognition, and neuroscience.

Prior to CMU, while I was in the UK, I was a graduate researcher at Web Intelligence Group, University College London (UCL), where I received M.Eng. in Computer Science in 2022. I was working with Prof. Emine Yilmaz and Prof. Aldo Lipani on Conversational AI and User Simulation. Also, I was a member of both Knowledge Graphs and NLP interest group at The Alan Turing Institute.

During the summer of 2022, I interned at Raft as a Machine Learning Engineer, automating the paperwork in freight forwarding industry by OCR and NLP.
From May 2021, I joined team Condita as a main developer to compete in the first Amazon Alexa Prize TaskBot Challenge, where I spent significant amount of time, constructing a real-time, multi-modal, knowledge-intensive, and interactive conversational assistant; our team has made to the quarterfinals.
During my undergraduate degree, I worked with Prof. Marianna Obrist at UCL (Human-Computer) Interaction Centre, finding ways to cluster text-based stories by authors' smell experience.

I take pride in being one of the early members of the UCL Artifical Intelligence Society, and during my three years of active involvement, I had the privilege of founding the first Machine Learning tutorial series, which has become an annual tradition.

Check out my blog page or pictures for more!

CV  /  Google Scholar  /  Blog  /  Twitter  /  GitHub

profile photo
Table of Contents  

Research  /  Publications  / 
Presentations & Workshops  / 
Teaching & Mentoring  / 
Research

Current Research:

  • Retrieval-Enhanced Machine Learning (REML) or RAG
  • Tip of the tonuge (known-item) retrieval

Theses


Publications
A Multi-Task Based Neural Model to Simulate Users in Goal-Oriented Dialogue Systems
To Eun Kim, Aldo Lipani
SIGIR, 2022
paper / poster / code / bibtex

Conversational User Simulator that 1) generates user-side utterance, 2) predicts user's next action and 3) satisfaction level by multi-task learning. SOTA in Satisfaction and Action prediction in USS dataset

Condita: A State Machine Like Architecture for Multi-Modal Task Bots
Jerome Ramos*, To Eun Kim*, Z. Shi, X. Fu, F. Ye, Y. Feng, Aldo Lipani
* denotes equal contribution.
Alexa Prize TaskBot Challenge Proceedings, 2022
paper / bibtex

We present COoking-aNd-DIy-TAsk-based (Condita) task-oriented dialogue system, for the 2021 Alexa Prize TaskBot Challenge. Condita provides an engaging multi-modal agent that assists users in cooking and home improvement tasks, creating a memorable and enjoyable experience to users. We discuss Condita's state machine like architecture and analyze the various conversational strategies implemented that allowed us to achieve excellent performance throughout the competition.

Attention-based Ingredient Phrase Parser
Z. Shi, P. Ni, M. Wang, To Eun Kim, Aldo Lipani
ESANN, 2022
paper / arXiv / code / bibtex

Spin-off research from the Alexa Prize TaskBot Challenge. Assisting users to cook is one of these tasks that are expected to be solved by intelligent assistants, where ingredients and its corresponding attributes, such as name, unit, and quantity, should be provided to users precisely and promptly. To provide an engaged and successful conversational service to users for cooking tasks, we propose a new ingredient parsing model.


Preprints
A Comprehensive Survey on Generative Diffusion Models for Structured Data
Heejoon Koo, To Eun Kim
arXiv / bibtex

There is still a lack of literature and its reviews on structured data modelling via diffusion models, compared to other data modalities such as visual and textual data. To address this gap, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works that used structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions.

When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution
Z. Shi*, J. Ramos*, To Eun Kim, X. Wang, H. Rahmani, Aldo Lipani
* denotes equal contribution.
arXiv / bibtex

In the NeurIPS 2022 IGLU Challenge NLP Task, we address two key research questions: 1) when should the agent ask for clarification, and 2) what clarification questions should it ask. In this report, we briefly introduce our methods for the classification and ranking task. For the classification task, our model achieves an F1 score of 0.757, which placed the 3rd on the leaderboard. For the ranking task, our model achieves about 0.38 for Mean Reciprocal Rank by extending the traditional ranking model. Lastly, we discuss various neural approaches for the ranking task and future direction.


Presentations & Workshops
A Multi-Task Based Neural Model to Simulate Users in Goal-Oriented Dialogue Systems
Presented at the SCAI: Search-Oriented Conversational AI Workshop
15th July, 2022
Recording / Slides

Teaching & Mentoring
Course Co-organiser
  • Intensive Python Programming Course (for MSc students' dissertation)
Teaching Assistants (MSc courses)
  • CEGE0096: Geospatial Programming (Fall 2022)
  • CEGE0004: Machine Learning for Data Science (Spring 2023)
  • COMP0071 Software Engineering (Spring 2023)
  • COMP0189: Applied Artificial Intelligence (Spring 2023)
Transition Mentor
  • Helping 1st year CS students with programming (C, Java, Python, Haskell)
UCL Artificial Intelligence Society
A Founder, Maintainer, and Lecturer of Machine Learning Tutorial Series

You can know me better from my CV


Jump to the top of this page.
Website source code is adapted from here.