Microsoft is preparing to add OpenAI’s ChatGPT chatbot to its Bing search engine
OpenAI, the AI research shop backed by a $1 billion investment from Microsoft, publicly released ChatGPT for users to test in November. The chatbot’s ability to spout everything from cocktail recipes to authentic-seeming school essays has since catapulted it into the spotlight. While the AI service sometimes confidently offers incorrect information with a patina of authority, some analysts and experts have suggested its ability to summarize publicly available data can make it a credible alternative to Google search and a list of search-generated links.
OpenAI, the AI research shop backed by a $1 billion investment from Microsoft, publicly released ChatGPT for users to test in November. The chatbot’s ability to spout everything from cocktail recipes to authentic-seeming school essays has since catapulted it into the spotlight. While the AI service sometimes confidently offers incorrect information with a patina of authority, some analysts and experts have suggested its ability to summarize publicly available data can make it a credible alternative to Google search and a list of search-generated links.
Bloomberg.com
Microsoft Hopes OpenAI’s Chatbot Will Make Bing Smarter
ChatGPT’s accuracy will be key to timing of any rollout
The analysis of several major underground hacking communities shows that there are already first instances of cybercriminals using OpenAI to develop malicious tools.
– creating Infostealer
– creating an encryption tool
– facilitating ChatGPT for fraud activity
– creating Infostealer
– creating an encryption tool
– facilitating ChatGPT for fraud activity
Check Point Research
OPWNAI : Cybercriminals Starting to Use ChatGPT - Check Point Research
Introduction At the end of November 2022, OpenAI released ChatGPT, the new interface for its Large Language Model (LLM), which instantly created a flurry of interest in AI and its possible uses. However, ChatGPT has also added some spice to the modern cyber…
The Art of LaTeX
Some common mistakes that are made by LaTeX practitioners (even in heavily cited papers)
Some common mistakes that are made by LaTeX practitioners (even in heavily cited papers)
On the Security Vulnerabilities of Text-to-SQL Models
Authors showed that the Text-to-SQL modules of two commercial black boxes (Baidu-UNIT and Codex-powered Ai2sql) can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service. This demonstrates the danger of NLP models being exploited as attack vectors in the wild. Moreover, experiments involving four open-source frameworks verified that simple backdoor attacks can achieve a 100% success rate on Text-to-SQL systems with almost no prediction performance impact.
Authors showed that the Text-to-SQL modules of two commercial black boxes (Baidu-UNIT and Codex-powered Ai2sql) can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service. This demonstrates the danger of NLP models being exploited as attack vectors in the wild. Moreover, experiments involving four open-source frameworks verified that simple backdoor attacks can achieve a 100% success rate on Text-to-SQL systems with almost no prediction performance impact.
LineVul: A Transformer-based Line-Level Vulnerability Prediction
The authors propose a novel approach to detecting vulnerabilities in source code. The approach uses machine learning and works at line level.
Code
The authors propose a novel approach to detecting vulnerabilities in source code. The approach uses machine learning and works at line level.
Code
An Analysis of the Automatic Bug Fixing Performance of ChatGPT
The paper is devoted to evaluation ChatGPT on the standard bug fixing benchmark set, QuixBugs. ChatGPT’s bug fixing performance is competitive to the common deep learning approaches CoCoNut and Codex and notably better than the results reported for the standard program repair approaches. In contrast to previous approaches, ChatGPT offers a dialogue system through which further information, e.g., the expected output for a certain input or an observed error message, can be entered. By providing such hints to ChatGPT, its success rate can be further increased, fixing 31 out of 40 bugs, outperforming state-of-the-art.
The paper is devoted to evaluation ChatGPT on the standard bug fixing benchmark set, QuixBugs. ChatGPT’s bug fixing performance is competitive to the common deep learning approaches CoCoNut and Codex and notably better than the results reported for the standard program repair approaches. In contrast to previous approaches, ChatGPT offers a dialogue system through which further information, e.g., the expected output for a certain input or an observed error message, can be entered. By providing such hints to ChatGPT, its success rate can be further increased, fixing 31 out of 40 bugs, outperforming state-of-the-art.
Which Features are Learned by CodeBert: An Empirical Study of the BERT-based Source Code Representation Learning
Recently researchers applied the BERT to source-code representation learning and reported some good news on several downstream tasks. However, in this paper, the authors illustrated that current methods cannot effectively understand the logic of source codes.
Recently researchers applied the BERT to source-code representation learning and reported some good news on several downstream tasks. However, in this paper, the authors illustrated that current methods cannot effectively understand the logic of source codes.
Metaverse Race
- Facebook changed its name to Meta to reposition itself and stay ahead of the curve, envisioning a utopian future in which billions of people inhabit immersive digital environments working, socializing, and gaming inside virtual and augmented worlds.
- Apple is working on an Advanced Virtual Reality gadget that it says could revolutionize the metaverse experience.
- Google is reported to be working on an innovative Augmented Reality device and may create a separate and unique metaverse platform.
- Microsoft has also joined the race and is creating a digital world called “Mesh”, aiming to incorporate virtual experiences into Microsoft teams. On January 18, 2022, the company announced its intention to acquire gaming company Activision Blizzard for over US$68 billion.
- Disney is creating its own metaverse to act as an extension of Disney films and its streaming service. Disney-patented metaverse technology for theme parks is expected to project 3D images of visitors.
- Huawei is partnering up with multiple companies to explore metaverse applications, including Perfect World (China), Beijing Shougang Park, and Tiny Island Productions (Singapore).
- Alibaba has applied for the trademarks “Ali Yuan Universe” and “Taobao Yuan Universe.”
- Facebook changed its name to Meta to reposition itself and stay ahead of the curve, envisioning a utopian future in which billions of people inhabit immersive digital environments working, socializing, and gaming inside virtual and augmented worlds.
- Apple is working on an Advanced Virtual Reality gadget that it says could revolutionize the metaverse experience.
- Google is reported to be working on an innovative Augmented Reality device and may create a separate and unique metaverse platform.
- Microsoft has also joined the race and is creating a digital world called “Mesh”, aiming to incorporate virtual experiences into Microsoft teams. On January 18, 2022, the company announced its intention to acquire gaming company Activision Blizzard for over US$68 billion.
- Disney is creating its own metaverse to act as an extension of Disney films and its streaming service. Disney-patented metaverse technology for theme parks is expected to project 3D images of visitors.
- Huawei is partnering up with multiple companies to explore metaverse applications, including Perfect World (China), Beijing Shougang Park, and Tiny Island Productions (Singapore).
- Alibaba has applied for the trademarks “Ali Yuan Universe” and “Taobao Yuan Universe.”
Huawei BLOG
The Metaverse Race is On
The second post in this series looks at the concepts powering the metaverse, what the tech giants are doing in the space, potential market size, and current and future business models.
ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning
Pre-trained models such as CodeBERT and GraphCodeBERT are not robust to adversarial attacks and a simple mutation operator (e.g., variable renaming) degrades their performance significantly. To address this problem, the model ContraBERT is proposed.
Pre-trained models such as CodeBERT and GraphCodeBERT are not robust to adversarial attacks and a simple mutation operator (e.g., variable renaming) degrades their performance significantly. To address this problem, the model ContraBERT is proposed.
CodeScore: Evaluating Code Generation by Learning Code Execution
Prevailing code evaluation metrics (CEM) can be categorized into
- match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) and
- execution-based CEMs (e.g., AvgPassRatio and Pass@k),
but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, including collecting expensive test cases, resolving tedious execution dependencies, and enormous execution time.
CodeScore, an efficient and effective CEM for code generation, which estimates test case PassRatio of generated code without executing code.
Prevailing code evaluation metrics (CEM) can be categorized into
- match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) and
- execution-based CEMs (e.g., AvgPassRatio and Pass@k),
but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, including collecting expensive test cases, resolving tedious execution dependencies, and enormous execution time.
CodeScore, an efficient and effective CEM for code generation, which estimates test case PassRatio of generated code without executing code.
NaturalProver: Grounded Mathematical Proof Generation with Language Models
NaturalProver is a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding.
It is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
NaturalProver is a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding.
It is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (Salesforce)
CodeRL is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, the code-generating LM is treated as an actor network, and a critic network is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.
For the model backbones, the authors extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data.
github
CodeRL is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, the code-generating LM is treated as an actor network, and a critic network is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.
For the model backbones, the authors extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data.
github
Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft
CORAL — is a novel Graph-based machining learning model that leverages a socio-technical graph built from the rich set of entities (developers, repositories, files, pull requests, work-items, etc.) and their relationships in modern source code management systems. The authors train a Graph Convolutional Neural network (GCN) on this graph to learn to recommend code reviewers for pull requests.
CORAL — is a novel Graph-based machining learning model that leverages a socio-technical graph built from the rich set of entities (developers, repositories, files, pull requests, work-items, etc.) and their relationships in modern source code management systems. The authors train a Graph Convolutional Neural network (GCN) on this graph to learn to recommend code reviewers for pull requests.
Dataset Distillation: A Comprehensive Review
— A comprehensive review and summary for recent advances in DD and its application
— A comprehensive review and summary for recent advances in DD and its application
PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning
PPOCoder is a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs
github
PPOCoder is a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs
github