Yewei Song (宋 業偉)

PhD student in Computer Science

About Me

Hi, my name is Yewei and now I’m a Doctoral Researcher of University of Luxembourg, supervised by Jacques Klein. I am a programming fan from childhood. My academic research is mainly in the field of Natural Language Processing. My common technology stack includes Python, Java, JavaScripts etc. In NLP, my focus area is Large Language Model and Software Engineering.

こんにちは、私の名前はイェウェイです。現在、私はルクセンブルク大学の博士研究員で、ジャック・クラインに指導を受けています。 私の学術研究は主に自然言語処理の分野にあります。私の一般的な技術スタックにはPython、Java、JavaScriptなどが含まれています。 NLPでは、主に大規模言語モデルとソフトウェアエンジニアリングに焦点を当てています。

Moien, Ech heeschen Yewei an elo sinn ech Doktorand Fuerscher vun der Universitéit Lëtzebuerg, ënner Surveilléiere vum Jacques Klein.

Major Publications

Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance

https://arxiv.org/abs/2404.08817

Submitted to ARR Feburary.

This paper reassesses recent code similarity evaluation metrics, with a specific focus on Abstract Syntax Tree (AST) editing distance across diverse programming languages, demonstrating its efficacy in capturing intricate code structures and proposing an adaptable metric, TSED, that outperforms traditional sequence similarity metrics and prompt-based GPT similarity scores.

Enhancing Text-to-SQL Translation for Financial System Design

https://arxiv.org/abs/2312.14725

Accepted by ICSE-2024:SEIP track.

In this paper, we investigate the role of Large Language Models (LLMs) in enhancing Text-to-SQL performance by benchmarking their efficacy, proposing novel similarity metrics for SQL queries, and demonstrating their applicability in financial domain use cases, thereby facilitating the automation of natural language interaction with relational databases and advancing the practical adoption of Text2SQL systems.

Letz Translate Low-Resource Machine Translationfor Luxembourgish

https://ieeexplore.ieee.org/document/10236754

Accepted by ICNLP-2023.

In this paper, we propose a method for improving machine translation accuracy in low-resource language environments by utilizing knowledge distillation from large multilingual models and incorporating high-resource languages related to the target language, exemplified through the evaluation on Luxembourgish, demonstrating over 30% faster performance with only a 4% decrease in accuracy compared to state-of-the-art models.

Text Logical Scoring Based on Knowledge Graph

http://etamin.github.io/pdf/Master_thesis.pdf

Master Thesis for UoB. Supervised by P.J.Hancox

This is a research project of my Master Thesis, the paper proposes to use the scale of the knowledge graph as a criterion for judging the logic of essay, and summarizes the methods for constructing the knowledge graph from essay and the problems that may be encountered in the scoring process.

Working Experience

USTC iFlytek

https://www.iflytek.com/en/

Senior Software Engineer

June 2017 - August 2019

China's well-known intelligent voice company

  • Design a analysis system based on K-Medoids Clustering and Hungarian algorithm distance model by PySpark(Python)
  • Design a text data compare system based on Word2vec
  • Design a collaborative document transfer system(Java)
  • Design a easy setup load test system by Node.js
  • Lead a DevOps and grayscale release team
  • System load and risk analysis(Nginx \& Haproxy)
  • Code review and static checking(Java)
  • ASR model training and testing
  • Agile team management. New employee training lecturer

China UnionPay

http://www.unionpayintl.com/en/

Assistant Software Engineer

July 2016 - June 2017

UnionPay is the largest card payment organization in the world

Cross platform(AIX and x86-64) QR-Code payment system based on Spring and MySQL. Agile tool design (auto deploy tool) and integrated deployment solution design.

Education

University of Luxembourg

PhD of Computer Science

2022 - Now

The University of Luxembourg was founded in 2003 by combining four existing education and research institutes: the Centre universitaire, Institut supérieur d'études et de recherches pédagogiques, Institut supérieur de technologie, and Institut d'études éducatives et sociales. The university is the only public university in Luxembourg.

As a PhD student in SnT, I mainly research on Natural Language Processing Tasks.

Osaka University

PhD of Computer Science

2021 - 2022

Osaka University is a public research university located in Osaka Prefecture, Japan. It was one of Imperial Universities in Japan, one of the Designated National University and selected as a Top Type university of Top Global University Project by the Japanese government.

As a PhD student in Intelligence and Sensing Lab., I mainly research on Video Question-Answering system and other NLP based fusion missions. Transfer to Luxembourg by March 2022.

University of Birmingham

MSc Advance Computer Science

2019 - 2020

The University of Birmingham is a public research university located in Edgbaston, Birmingham, United Kingdom. It received its royal charter in 1900 as a successor to Queen's College, Birmingham and Mason Science College, making it the first English civic or 'red brick' university to receive its own royal charter.

MSc Advance Computer Science in UoB is a half-research programme, contain two areas mini research project in one year. I chose the following courses,

  • Neural Computation
  • Machine Learning
  • Intelligent Data Analysis
  • Mobile & Ubiquitous Computing
  • Human Computer Interaction

Nanjing University of Aeronautics and Astronautics

BEng Computer Science and Technology

2012 - 2016

Nanjing University of Aeronautics and Astronautics (NUAA), is an elite, Chinese Ministry of Education Double First Class Discipline University, with Double First Class status in certain disciplines.

In four years of undergraduate study, systematically studied the main theoretical courses of computer science. Participated in ACM-ICPC and get bronze prize, also participated other domestic programming competitions. Before my undergraduate study, I get second prize on National Olympiad in Informatics in Provinces. The thesis topic is: “Design of ERP system based on relational database”

A Little More About Me

Alongside my interests in computer science some of my other interests and hobbies are ::

  • Travel
  • Photography
  • History of Europe/Japan
  • Video game
  • Anime
  • Foreign Language Learning(Japanese/Luxembourgish/French)

My Flightdiary.net profile