ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning
Pre-trained models such as CodeBERT and GraphCodeBERT are not robust to adversarial attacks and a simple mutation operator (e.g., variable renaming) degrades their performance significantly. To address this problem, the model ContraBERT is proposed.
Pre-trained models such as CodeBERT and GraphCodeBERT are not robust to adversarial attacks and a simple mutation operator (e.g., variable renaming) degrades their performance significantly. To address this problem, the model ContraBERT is proposed.
CodeScore: Evaluating Code Generation by Learning Code Execution
Prevailing code evaluation metrics (CEM) can be categorized into
- match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) and
- execution-based CEMs (e.g., AvgPassRatio and Pass@k),
but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, including collecting expensive test cases, resolving tedious execution dependencies, and enormous execution time.
CodeScore, an efficient and effective CEM for code generation, which estimates test case PassRatio of generated code without executing code.
Prevailing code evaluation metrics (CEM) can be categorized into
- match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) and
- execution-based CEMs (e.g., AvgPassRatio and Pass@k),
but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, including collecting expensive test cases, resolving tedious execution dependencies, and enormous execution time.
CodeScore, an efficient and effective CEM for code generation, which estimates test case PassRatio of generated code without executing code.
NaturalProver: Grounded Mathematical Proof Generation with Language Models
NaturalProver is a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding.
It is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
NaturalProver is a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding.
It is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (Salesforce)
CodeRL is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, the code-generating LM is treated as an actor network, and a critic network is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.
For the model backbones, the authors extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data.
github
CodeRL is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, the code-generating LM is treated as an actor network, and a critic network is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.
For the model backbones, the authors extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data.
github
Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft
CORAL — is a novel Graph-based machining learning model that leverages a socio-technical graph built from the rich set of entities (developers, repositories, files, pull requests, work-items, etc.) and their relationships in modern source code management systems. The authors train a Graph Convolutional Neural network (GCN) on this graph to learn to recommend code reviewers for pull requests.
CORAL — is a novel Graph-based machining learning model that leverages a socio-technical graph built from the rich set of entities (developers, repositories, files, pull requests, work-items, etc.) and their relationships in modern source code management systems. The authors train a Graph Convolutional Neural network (GCN) on this graph to learn to recommend code reviewers for pull requests.
Dataset Distillation: A Comprehensive Review
— A comprehensive review and summary for recent advances in DD and its application
— A comprehensive review and summary for recent advances in DD and its application
PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning
PPOCoder is a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs
github
PPOCoder is a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs
github
Developing Hands-on Labs for Source Code Vulnerability Detection with AI
Vulnerability detection tutorials
Vulnerability detection tutorials
Google announces Bard, an experimental conversational AI service, powered by LaMDA. "Today, we’re taking another step forward by opening it up to trusted testers".
Google
An important next step on our AI journey
Introducing Bard (now Gemini), Google's conversational AI service — plus, new AI features in Search.
ICSE 2023, Technical Track, Accepted Papers: https://conf.researchr.org/track/icse-2023/icse-2023-technical-track#event-overview
conf.researchr.org
ICSE 2023 - Technical Track - ICSE 2023
Call for Papers
ICSE is the premier forum for presenting and discussing the most recent and significant technical research contributions in the field of Software Engineering. In the technical track, we invite high quality submissions of technical research…
ICSE is the premier forum for presenting and discussing the most recent and significant technical research contributions in the field of Software Engineering. In the technical track, we invite high quality submissions of technical research…
ICSE 2023, NIER - New Ideas and Emerging Results, Accepted Papers:
https://conf.researchr.org/track/icse-2023/icse-2023-NIER#event-overview
https://conf.researchr.org/track/icse-2023/icse-2023-NIER#event-overview
conf.researchr.org
ICSE 2023 - NIER - New Ideas and Emerging Results - ICSE 2023
Call for Papers
The New Ideas and Emerging Results (NIER) track at ICSE provides a vibrant forum for forward-looking, innovative research in software engineering. Our aim is to accelerate the exposure of the software engineering community to early yet…
The New Ideas and Emerging Results (NIER) track at ICSE provides a vibrant forum for forward-looking, innovative research in software engineering. Our aim is to accelerate the exposure of the software engineering community to early yet…
ICSE 2023, FoSE - Future of Software Engineering
https://conf.researchr.org/track/icse-2023/fose
https://conf.researchr.org/track/icse-2023/fose
conf.researchr.org
ICSE 2023 - FoSE - Future of Software Engineering - ICSE 2023
This special Future of Software Engineering track will feature presentations and discussions from top researchers and industry leaders on some key directions in future software engineering. The track is part of the main ICSE programme, planned for one 90…
The ICSE'23 Workshop on Cloud Intelligence / AIOps
https://cloudintelligenceworkshop.org/
https://cloudintelligenceworkshop.org/
cloudintelligenceworkshop.org
Cloud Intelligence / AIOps The Workshop
Conference project
SantaCoder: Don't reach the stars!
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. The authors train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. They find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. The authors train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. They find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
Packing Unit Squares in Squares: A Survey and New Results
Let s(n) be the side of the smallest square into which we can pack n unit squares. The paper presents a history of this problem, and gives the best known upper and lower bounds for s(n) for n ≤ 100, including the best known packings.
Best known packings: https://erich-friedman.github.io/packing/squinsqu/
Let s(n) be the side of the smallest square into which we can pack n unit squares. The paper presents a history of this problem, and gives the best known upper and lower bounds for s(n) for n ≤ 100, including the best known packings.
Best known packings: https://erich-friedman.github.io/packing/squinsqu/
Bing’s A.I. Chat: 'I Want to Be Alive'
In a two-hour conversation with our columnist, Microsoft’s new chatbot said it would like to be human, had a desire to be destructive and was in love with the person it was chatting with.
In a two-hour conversation with our columnist, Microsoft’s new chatbot said it would like to be human, had a desire to be destructive and was in love with the person it was chatting with.
NY Times
Bing’s A.I. Chat: ‘I Want to Be Alive. 😈’ (Published 2023)
In a two-hour conversation with our columnist, Microsoft’s new chatbot said it would like to be human, had a desire to be destructive and was in love with the person it was chatting with. Here’s the transcript.
Transformer models: an introduction and catalog
Comprehensive and simple catalog and classification of the most popular Transformer models.
Table: https://docs.google.com/spreadsheets/d/1ltyrAB6BL29cOv2fSpNQnnq2vbX8UrHl47d7FkIf6t4/
Comprehensive and simple catalog and classification of the most popular Transformer models.
Table: https://docs.google.com/spreadsheets/d/1ltyrAB6BL29cOv2fSpNQnnq2vbX8UrHl47d7FkIf6t4/
👍1
Amazon’s Cloud Unit Partners With Startup Hugging Face as AI Deals Heat Up
Amazon.com Inc.’s cloud unit is expanding a partnership with artificial intelligence startup Hugging Face Inc., which is developing a ChatGPT rival, the latest move as the biggest technology firms line up allies in an attention-getting market for generative AI systems.
Amazon.com Inc.’s cloud unit is expanding a partnership with artificial intelligence startup Hugging Face Inc., which is developing a ChatGPT rival, the latest move as the biggest technology firms line up allies in an attention-getting market for generative AI systems.
Bloomberg.com
Amazon’s Cloud Unit Partners With Startup Hugging Face as AI Deals Heat Up
Amazon Web Services will offer the startup’s products to its customers and run its next large language tool