ml4se – Telegram

ml4se

500 subscribers

448 photos

1 file

526 links

Machine Learning for Software Engineering

Download Telegram

About

Blog

Apps

Platform

500 subscribers

Bug analysis in Jupyter notebook projects An empirical study

The paper presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. The authors mined 14,740 commits from 105 GitHub open-source projects with Jupyter notebook code. Next, they analyzed 30,416 Stack Overflow posts which gave them insights into bugs that practitioners face when developing Jupyter notebook projects.

• RQ1. What types of bugs are more frequent?
• RQ2. What are the root causes of bugs?
• RQ3. What are the frequent impacts of bugs?
• RQ4. What challenges do data scientists face in practice on Jupyter Projects?

50 views15:49

UL2: Unifying Language Learning Paradigms (Google)

A novel language pre-training paradigm called Unified Language Learner (UL2) frames different objective functions for training language models as denoising tasks, where the model has to recover missing sub-sequences of a given input. During pre-training it uses a novel mixture-of-denoisers that samples from a varied set of such objectives, each with different configurations.

63 views03:57

What is it like to program with artificial intelligence (Microsoft)

The authors explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance. They find that while LLM-assisted programming shares some properties of compilation, pair programming, and programming via search and reuse, there are fundamental differences both in the technical possibilities as well as the practical experience. Thus, LLM-assisted programming ought to be viewed as a new way of programming with its own distinct properties and challenges.

122 views03:26

Forest: Structural Code Editing with Multiple Cursors (ETH)

Forest allows to perform a single action simultaneously in multiple program locations and thus support complex refactorings.

54 views15:24

https://github.com/Guitaricet/howto

GitHub - Guitaricet/howto: How to do bash commands you always forget. OpenAI-powered.

How to do bash commands you always forget. OpenAI-powered. - Guitaricet/howto

53 views09:24

https://pypi.org/project/namecompare/

Ratio calculator between names - especially of elements (variables, classes and methods) in a program code.

53 views04:54

Minerva: Solving Quantitative Reasoning Problems with Language Models (Google)

Minerva is a large language model pretrained on general natural language data and further trained on technical content. The main novelty of the paper is a large training dataset that juxtaposes natural language with the correct use of formal mathematical language, such as equations and diagrams. The data is collected from the arXiv preprint server and from web pages.

94 viewsedited 01:09

Code as Policies: Language Model Programs for Embodied Control (Google)

Large language models trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings. These models can be re-purposed to write robot policy code, given natural language commands.

Project website: https://code-as-policies.github.io/

code-as-policies.github.io

Code as Policies

Project page for Code as Policies: Language Model Programs for Embodied Control.

53 views01:44

Research talk: Cloud Intelligence/AIOps – Infusing AI into cloud computing (Microsoft)

51 views10:58

Fixing Dockerfile Smells: An Empirical Study

RQ1: How do developers fix Dockerfile smells?
RQ2: Which Dockerfile smells are developers willing to address?

101 views14:16

DALL·E API Now Available in Public Beta

Developers can now integrate DALL·E directly into their apps and products through API.

DALL·E API now available in public beta

Starting today, developers can begin building apps with the DALL·E API.

50 views17:13

Microsoft sued for open-source piracy through GitHub Copilot

Programmer and lawyer Matthew Butterick has sued Microsoft, GitHub, and OpenAI, alleging that GitHub's Copilot violates the terms of open-source licenses and infringes the rights of programmers.

Apart from the license violations, Butterick also alleges that the development feature violates the following:
- GitHub's terms of service and privacy policies,
- DMCA 1202, which forbids the removal of copyright-management information,
- the California Consumer Privacy Act,
- and other laws giving rise to the related legal claims.

The complaint was submitted to the U.S. District Court of the Northern District of California, demanding the approval of statutory damages of $9,000,000,000.

BleepingComputer

Microsoft sued for open-source piracy through GitHub Copilot

Programmer and lawyer Matthew Butterick has sued Microsoft, GitHub, and OpenAI, alleging that GitHub's Copilot violates the terms of open-source licenses and infringes the rights of code authors.

😁1

48 views06:50

TOSS: Revisiting Code Search in a Two-Stage Paradigm (Microsoft)

The paper proposes a combination of two main DL-based approaches to code search — a fusion of bi-encoder and cross-encoder methods. The framework achieves state-of-the-art accuracy with an overall mean reciprocal ranking score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713.

49 views09:50

μBERT: Mutation Testing using Pre-Trained Language Models

μBERT is a mutation testing tool. It exploits CodeBERT to generate mutants. The proposed approach is compared with PiTest on fault detection and assertion inference.

51 views11:46

The Illustrated Stable Diffusion

A gentle introduction to how Stable Diffusion works.

58 views15:00

Hey, Copilot!

Voice-based interaction with GitHub Copilot

The GitHub Blog

Everything new from GitHub Universe 2022

See what we're building to enhance the most integrated developer platform that allows developers and enterprises to drive innovation with ease.

111 views01:46