profile
viewpoint
Justin Gottschlich jgottschlich https://sites.google.com/view/gottschlich/home Principal AI Scientist & Founder/Director of Machine Programming Research, Intel Labs

IntelLabs/control-flag 998

A system to flag anomalous source code expressions by learning typical expressions from training data

IntelLabs/MICSAS 7

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

nirhasabnis/MICSAS 2

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

nirhasabnis/control-flag 1

A system to flag anomalous source code expressions by learning typical expressions from training data

jgottschlich/control-flag 0

A system to flag anomalous source code expressions by learning typical expressions from training data

jgottschlich/MICSAS 0

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

push eventIntelLabs/control-flag

Niranjan Hasabnis

commit sha a12afc841204f9ee0bdbc47d41a721bd307d22ad

Handle git clone of non-existent repos git clone command when given a non-existent repo asks for user credentials (not sure why). This creates problems for automated download/clone of repositories for ControlFlag. By adding "-c core.askPass=echo" option to git clone, we can bypass these prompts, and instead we will get a verbose error like below: $ python3 download_repos.py -f failed.c100.txt -o training_repo_dir -m clone -p 1 Number of repos: 100 19%|███████████████ | 19/100 [01:08<02:23, 1.77s/it] remote: Support for password authentication was removed on August 13, 2021. Please use a personal access token instead. remote: Please see https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/ for more information. fatal: Authentication failed for 'https://github.com/craSH/socat/'

view details

Justin Gottschlich

commit sha b31c4a5c71da88cbce80b0e56b1269b44d0c3099

Merge pull request #34 from IntelLabs/nhasabni/git_download_nonexistent_repo Handle git clone of non-existent repos

view details

push time in 2 months

PR merged IntelLabs/control-flag

Handle git clone of non-existent repos bug

This PR fixes issue #32.

git clone command when given a non-existent repo asks for user credentials (not sure why). This creates problems for automated download/clone of repositories for ControlFlag. By adding "-c core.askPass=echo" option to git clone, we can bypass these prompts, and instead we will get a verbose error like below:

$ python3 download_repos.py -f failed.c100.txt -o training_repo_dir -m clone -p 1
Number of repos: 100
 19%|███████████████ | 19/100 [01:08<02:23,  1.77s/it]
remote: Support for password authentication was
removed on August 13, 2021. Please use a personal access token instead.
remote: Please see
https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/
for more information.
fatal: Authentication failed for 'https://github.com/craSH/socat/'
+1 -1

0 comment

1 changed file

nhasabni

pr closed time in 2 months

PullRequestReviewEvent

issue commentIntelLabs/control-flag

Is it possible to mine c# pattern

Hi @89trillion-wuchengbo - I don't believe we've looked at C# yet, but we'd be delighted to add it to our list of future languages to support.

I'll mark this as an enhancement request and @nhasabni and I can start working on it at some point. :)

Best, The ControlFlag Team

89trillion-wuchengbo

comment created time in 2 months

issue commentIntelLabs/control-flag

[BUG] Authentication Error, Not Handled Correctly

Hi @Danc2050 - thanks so much for reporting this to us. @nhasabni and I will take a look at it immediately!

Best, The ControlFlag Team

Danc2050

comment created time in 2 months

issue closedIntelLabs/control-flag

I've tried it with ClickHouse and it did not find anything meaningful.

I've tested https://github.com/ClickHouse/ClickHouse (sources without submodules) with full training data. It has found only a few false positives (see below).

It means that either ClickHouse source code is too good (which I don't believe) or control-flag did not do the job well.

$ grep -B1 -A10 "Potential anomaly" *.log 
thread_0.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("+") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/atomic.h:212:4:(u.r[0]+1)
thread_0.log:Expression is Potential anomaly
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("+") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:0 and occurrences: 2
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("&") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:1 and occurrences: 2559
thread_0.log-Did you mean:(parenthesized_expression (binary_expression (">") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:1 and occurrences: 1340
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("<") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:1 and occurrences: 1088
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("==") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:2 and occurrences: 6708
thread_0.log-
thread_0.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("+") (non_terminal_expression) (number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/atomic.h:212:4:(u.r[0]+1)
thread_0.log:Expression is Potential anomaly
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("+") (non_terminal_expression) (number_literal))) with editing cost:0 and occurrences: 24
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("<") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 466215
thread_0.log-Did you mean:(parenthesized_expression (binary_expression (">") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 229340
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("&") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 60040
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("%") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 2847
thread_0.log-
thread_0.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("+") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/atomic.h:213:4:(u.r[1]+1)
thread_0.log:Expression is Potential anomaly
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("+") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:0 and occurrences: 2
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("&") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:1 and occurrences: 2559
thread_0.log-Did you mean:(parenthesized_expression (binary_expression (">") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:1 and occurrences: 1340
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("<") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:1 and occurrences: 1088
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("==") (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(number_literal))) with editing cost:2 and occurrences: 6708
thread_0.log-
thread_0.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("+") (non_terminal_expression) (number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/atomic.h:213:4:(u.r[1]+1)
thread_0.log:Expression is Potential anomaly
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("+") (non_terminal_expression) (number_literal))) with editing cost:0 and occurrences: 24
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("<") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 466215
thread_0.log-Did you mean:(parenthesized_expression (binary_expression (">") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 229340
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("&") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 60040
thread_0.log-Did you mean:(parenthesized_expression (binary_expression ("%") (non_terminal_expression) (number_literal))) with editing cost:1 and occurrences: 2847
thread_0.log-
thread_0.log-Level:ONE Expression:(parenthesized_expression (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/atomic.h:222:4:(u.r[0])
thread_0.log-Expression is Okay
thread_0.log-Level:TWO Expression:(parenthesized_expression (subscript_expression (non_terminal_expression) (number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/atomic.h:222:4:(u.r[0])
thread_0.log-Expression is Okay
--
thread_1.log-Level:ONE Expression:(parenthesized_expression (field_expression (call_expression (field_expression (identifier)(field_identifier))(argument_list))(field_identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/ReplicatedMergeTreeSink.h:49:11:(context->getSettingsRef().deduplicate_blocks_in_dependent_materialized_views)
thread_1.log:Expression is Potential anomaly
thread_1.log-Did you mean:(parenthesized_expression (field_expression (call_expression (field_expression (identifier)(field_identifier))(argument_list))(field_identifier))) with editing cost:0 and occurrences: 3
thread_1.log-Did you mean:(parenthesized_expression (field_expression (field_expression (field_expression (identifier)(field_identifier))(field_identifier))(field_identifier))) with editing cost:2 and occurrences: 41060
thread_1.log-Did you mean:(parenthesized_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(identifier))(field_identifier))) with editing cost:2 and occurrences: 20258
thread_1.log-Did you mean:(parenthesized_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(number_literal))(field_identifier))) with editing cost:2 and occurrences: 3340
thread_1.log-
thread_1.log-Level:TWO Expression:(parenthesized_expression (field_expression argument: (identifier) field: (field_identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/ReplicatedMergeTreeSink.h:49:11:(context->getSettingsRef().deduplicate_blocks_in_dependent_materialized_views)
thread_1.log-Expression is Okay
thread_1.log-[TID=139711595431680] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Storages/IStorage_fwd.h
thread_1.log-[TID=139711595431680] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Storages/SelectQueryDescription.h
thread_1.log-[TID=139711595431680] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Storages/RabbitMQ/StorageRabbitMQ.h
--
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("||") (call_expression (identifier)(argument_list (char_literal)(identifier)))(call_expression (identifier)(argument_list (char_literal)(identifier))))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/IO/readFloatText.h:384:7:(checkChar('e', in) || checkChar('E', in))
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("||") (call_expression (identifier)(argument_list (char_literal)(identifier)))(call_expression (identifier)(argument_list (char_literal)(identifier))))) with editing cost:0 and occurrences: 2
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("||") (call_expression (identifier)(argument_list (string_literal)(identifier)))(call_expression (identifier)(argument_list (string_literal)(identifier))))) with editing cost:2 and occurrences: 152
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("||") (non_terminal_expression) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/IO/readFloatText.h:384:7:(checkChar('e', in) || checkChar('E', in))
thread_2.log-Expression is Okay
thread_2.log-Level:ONE Expression:(parenthesized_expression (call_expression (field_expression (identifier)(field_identifier))(argument_list))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/IO/readFloatText.h:386:11:(in.eof())
thread_2.log-Expression is Okay
thread_2.log-Level:TWO Expression:(parenthesized_expression (call_expression)) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/IO/readFloatText.h:386:11:(in.eof())
thread_2.log-Expression is Okay
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (pointer_expression (call_expression (field_expression (identifier)(field_identifier))(argument_list)))(char_literal))) not found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/IO/readFloatText.h:395:11:(*in.position() == '-')
--
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/RPNBuilder.h:96:11:(func->name == "not")
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) with editing cost:0 and occurrences: 5
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(null))) with editing cost:1 and occurrences: 148401
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(true))) with editing cost:1 and occurrences: 4627
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(char_literal))) with editing cost:1 and occurrences: 4157
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(identifier))) with editing cost:2 and occurrences: 576742
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (non_terminal_expression) (string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/RPNBuilder.h:96:11:(func->name == "not")
thread_2.log-Expression is Okay
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("!=") (call_expression (field_expression (identifier)(field_identifier))(argument_list))(number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/RPNBuilder.h:98:15:(args.size() != 1)
thread_2.log-Expression is Okay
--
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/RPNBuilder.h:107:20:(func->name == "or")
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) with editing cost:0 and occurrences: 5
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(null))) with editing cost:1 and occurrences: 148401
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(true))) with editing cost:1 and occurrences: 4627
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(char_literal))) with editing cost:1 and occurrences: 4157
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(identifier))) with editing cost:2 and occurrences: 576742
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (non_terminal_expression) (string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/RPNBuilder.h:107:20:(func->name == "or")
thread_2.log-Expression is Okay
thread_2.log-[TID=139711587038976] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/MergeTreePartInfo.h
thread_2.log-[TID=139711587038976] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Storages/MergeTree/MergeTreeDataPartUUID.h
--
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:208:19:(func.name == "globalIn")
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) with editing cost:0 and occurrences: 5
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(null))) with editing cost:1 and occurrences: 148401
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(true))) with editing cost:1 and occurrences: 4627
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(char_literal))) with editing cost:1 and occurrences: 4157
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(identifier))) with editing cost:2 and occurrences: 576742
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (non_terminal_expression) (string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:208:19:(func.name == "globalIn")
thread_2.log-Expression is Okay
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:210:24:(func.name == "globalNotIn")
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) with editing cost:0 and occurrences: 5
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(null))) with editing cost:1 and occurrences: 148401
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(true))) with editing cost:1 and occurrences: 4627
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(char_literal))) with editing cost:1 and occurrences: 4157
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(identifier))) with editing cost:2 and occurrences: 576742
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (non_terminal_expression) (string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:210:24:(func.name == "globalNotIn")
thread_2.log-Expression is Okay
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:212:24:(func.name == "globalNullIn")
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) with editing cost:0 and occurrences: 5
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(null))) with editing cost:1 and occurrences: 148401
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(true))) with editing cost:1 and occurrences: 4627
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(char_literal))) with editing cost:1 and occurrences: 4157
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(identifier))) with editing cost:2 and occurrences: 576742
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (non_terminal_expression) (string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:212:24:(func.name == "globalNullIn")
thread_2.log-Expression is Okay
thread_2.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:214:24:(func.name == "globalNotNullIn")
thread_2.log:Expression is Potential anomaly
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(string_literal))) with editing cost:0 and occurrences: 5
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(null))) with editing cost:1 and occurrences: 148401
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(true))) with editing cost:1 and occurrences: 4627
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(char_literal))) with editing cost:1 and occurrences: 4157
thread_2.log-Did you mean:(parenthesized_expression (binary_expression ("==") (field_expression (identifier)(field_identifier))(identifier))) with editing cost:2 and occurrences: 576742
thread_2.log-
thread_2.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (non_terminal_expression) (string_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Interpreters/GlobalSubqueriesVisitor.h:214:24:(func.name == "globalNotNullIn")
thread_2.log-Expression is Okay
thread_2.log-[TID=139711587038976] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Interpreters/interpretSubquery.h
thread_2.log-[TID=139711587038976] Scanning File: /home/milovidov/work/ClickHouse_clean/src/Interpreters/RedundantFunctionsInOrderByVisitor.h
--
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("||") (binary_expression ("!=") (identifier)(number_literal))(binary_expression ("==") (identifier)(unary_expression ("-") (identifier))))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/powl.c:232:5:(x != 0.0 || y == -INFINITY)
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("||") (binary_expression ("!=") (identifier)(number_literal))(binary_expression ("==") (identifier)(unary_expression ("-") (identifier))))) with editing cost:0 and occurrences: 2
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("||") (binary_expression ("==") (identifier)(number_literal))(binary_expression ("==") (identifier)(unary_expression ("-") (identifier))))) with editing cost:1 and occurrences: 280
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("||") (binary_expression (">=") (identifier)(number_literal))(binary_expression ("==") (identifier)(unary_expression ("-") (identifier))))) with editing cost:1 and occurrences: 71
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("||") (binary_expression (">") (identifier)(number_literal))(binary_expression ("==") (identifier)(unary_expression ("-") (identifier))))) with editing cost:2 and occurrences: 106
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("||") (binary_expression ("==") (identifier)(identifier))(binary_expression ("==") (identifier)(unary_expression ("-") (identifier))))) with editing cost:2 and occurrences: 100
thread_3.log-
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("||") (non_terminal_expression) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/powl.c:232:5:(x != 0.0 || y == -INFINITY)
thread_3.log-Expression is Okay
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression (">=") (identifier)(identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/base/glibc-compatibility/musl/powl.c:235:4:(x >= LDBL_MAX)
thread_3.log-Expression is Okay
--
thread_3.log-Level:ONE Expression:(parenthesized_expression (call_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(identifier))(field_identifier))(argument_list))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/AggregateFunctions/AggregateFunctionSumMap.h:348:19:(elem.second[col].isNull())
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (call_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(identifier))(field_identifier))(argument_list))) with editing cost:0 and occurrences: 2
thread_3.log-Did you mean:(parenthesized_expression (field_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(identifier))(field_identifier))(field_identifier))) with editing cost:2 and occurrences: 1494
thread_3.log-Did you mean:(parenthesized_expression (call_expression (field_expression (field_expression (field_expression (identifier)(field_identifier))(field_identifier))(field_identifier))(argument_list))) with editing cost:2 and occurrences: 257
thread_3.log-Did you mean:(parenthesized_expression (subscript_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(identifier))(field_identifier))(identifier))) with editing cost:2 and occurrences: 229
thread_3.log-Did you mean:(parenthesized_expression (subscript_expression (field_expression (subscript_expression (field_expression (identifier)(field_identifier))(identifier))(field_identifier))(number_literal))) with editing cost:2 and occurrences: 146
thread_3.log-
thread_3.log-Level:TWO Expression:(parenthesized_expression (call_expression)) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/AggregateFunctions/AggregateFunctionSumMap.h:348:19:(elem.second[col].isNull())
thread_3.log-Expression is Okay
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression (">") (binary_expression ("<") (unary_expression ("!") (field_expression (call_expression (field_expression (identifier)(field_identifier))(argument_list))(field_identifier)))(identifier))(parenthesized_expression (identifier)))) not found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/AggregateFunctions/AggregateFunctionSumMap.h:415:11:(!params_.front().tryGet<Array>(keys_to_keep_))
thread_3.log-Expression is Okay
--
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:319:7:(auto * nullable = checkAndGetColumn<ColumnNullable>(src))
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) with editing cost:0 and occurrences: 193
thread_3.log-Did you mean:(parenthesized_expression (binary_expression (">") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130297
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("<") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130156
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("&") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 96043
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("=") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 12586
thread_3.log-
thread_3.log-Level:ONE Expression:(parenthesized_expression (identifier)) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:342:7:(is_nullable)
thread_3.log-Expression is Okay
thread_3.log-Level:TWO Expression:(parenthesized_expression (identifier)) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:342:7:(is_nullable)
thread_3.log-Expression is Okay
--
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("!=") (parenthesized_expression (binary_expression (">") (identifier)(number_literal)))(identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:436:19:((nan_direction_hint > 0) != reverse)
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("!=") (parenthesized_expression (binary_expression (">") (identifier)(number_literal)))(identifier))) with editing cost:0 and occurrences: 1
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("!=") (parenthesized_expression (binary_expression ("&") (identifier)(number_literal)))(identifier))) with editing cost:1 and occurrences: 962
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("!=") (parenthesized_expression (binary_expression (">>") (identifier)(number_literal)))(identifier))) with editing cost:1 and occurrences: 375
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("!=") (parenthesized_expression (binary_expression ("+") (identifier)(number_literal)))(identifier))) with editing cost:1 and occurrences: 198
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("!=") (parenthesized_expression (binary_expression ("^") (identifier)(number_literal)))(identifier))) with editing cost:1 and occurrences: 60
thread_3.log-
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("!=") (non_terminal_expression) (identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:436:19:((nan_direction_hint > 0) != reverse)
thread_3.log-Expression is Okay
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("<=") (binary_expression ("-") (identifier)(identifier))(number_literal))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:446:19:(last - first <= 1)
thread_3.log-Expression is Okay
--
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:519:7:(auto * nullable_column = checkAndGetColumn<ColumnNullable>(src))
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) with editing cost:0 and occurrences: 193
thread_3.log-Did you mean:(parenthesized_expression (binary_expression (">") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130297
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("<") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130156
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("&") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 96043
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("=") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 12586
thread_3.log-
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("==") (identifier)(identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:527:7:(src_column == nullptr)
thread_3.log-Expression is Okay
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("==") (identifier) (identifier))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/Columns/ColumnUnique.h:527:7:(src_column == nullptr)
thread_3.log-Expression is Okay
--
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/DataTypes/DataTypesDecimal.h:64:7:(auto * decimal_type = checkDecimal<Decimal64>(data_type))
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) with editing cost:0 and occurrences: 193
thread_3.log-Did you mean:(parenthesized_expression (binary_expression (">") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130297
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("<") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130156
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("&") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 96043
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("=") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 12586
thread_3.log-
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("*") (identifier)(binary_expression ("=") (identifier)(binary_expression (">") (binary_expression ("<") (identifier)(identifier))(parenthesized_expression (identifier)))))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/DataTypes/DataTypesDecimal.h:66:7:(auto * decimal_type = checkDecimal<Decimal128>(data_type))
thread_3.log-Expression is Okay
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/DataTypes/DataTypesDecimal.h:66:7:(auto * decimal_type = checkDecimal<Decimal128>(data_type))
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) with editing cost:0 and occurrences: 193
thread_3.log-Did you mean:(parenthesized_expression (binary_expression (">") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130297
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("<") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130156
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("&") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 96043
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("=") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 12586
thread_3.log-
thread_3.log-Level:ONE Expression:(parenthesized_expression (binary_expression ("*") (identifier)(binary_expression ("=") (identifier)(binary_expression (">") (binary_expression ("<") (identifier)(identifier))(parenthesized_expression (identifier)))))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/DataTypes/DataTypesDecimal.h:68:7:(auto * decimal_type = checkDecimal<Decimal256>(data_type))
thread_3.log-Expression is Okay
thread_3.log-Level:TWO Expression:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) found in training dataset: Source file: /home/milovidov/work/ClickHouse_clean/src/DataTypes/DataTypesDecimal.h:68:7:(auto * decimal_type = checkDecimal<Decimal256>(data_type))
thread_3.log:Expression is Potential anomaly
thread_3.log-Did you mean:(parenthesized_expression (binary_expression ("*") (identifier) (non_terminal_expression))) with editing cost:0 and occurrences: 193
thread_3.log-Did you mean:(parenthesized_expression (binary_expression (">") (identifier) (non_terminal_expression))) with editing cost:1 and occurrences: 130297

closed time in 2 months

alexey-milovidov

issue commentIntelLabs/control-flag

I've tried it with ClickHouse and it did not find anything meaningful.

Hi @tavplubix - you're totally right in that the example does a pretty lousy job in explaining how the addressing model would work under the hood. To provide that level of detail I'd probably have to show a more complete implementation.

That said, I had one of these kinds of deadlock scenarios in one of my first multi-threaded transactional memory systems. It was a nightmare because I couldn't reproduce it, only my customers could (the system was TBoost.STM). The problem with TBoost.STM was that the deadlock didn't occur unless you ran it north of 10M transactions with precisely the right number of threads and only on certain systems. The only way I found out about it is my customers reported it to me (I could never even reproduce it until I built Concurrent Predicates).

What I'm really trying to get at is that, at least in my case with TBoost.STM, the original b would be reassigned to b1's once b1 is re-allocated. I did a pretty lousy job explaining that; sorry about that! :) So the idea is, you don't actually have three mutexes, you actually only end up with b1 and a, but when a looks at b, it sees the old address. b's address is automatically updated under the hood to b = b1 when b1 is allocated and now the locking order gets inverted. Sorry about the terrible example, but this is the scenario I was trying to get across.

All said and done, if you feel comfortable that your system is safe, please just ignore me. :) I just worry there might be the potential for a TBoost.STM-like deadline (1 in 10,000,000). Probably the part that scares me the most is it's entirely possible your code is perfectly acceptable today, but maybe in the future someone violates the assumptions we rely on to make this code safe and then suddenly ... we're in trouble. That would be a super bummer. :(

Whatever the case, really appreciate the discussion and your openness to explore this!

Best, The ControlFlag Team

alexey-milovidov

comment created time in 2 months

startedjoker-eph/TBoost.STM

started time in 2 months

issue commentIntelLabs/control-flag

I've tried it with ClickHouse and it did not find anything meaningful.

Hi @tavplubix!

Thanks for reaching out. I think you're precisely right about the situation for two objects. And I believe that so long as it's only two DatabaseAtomic objects, I think your strategy will work. The problem I believe comes in the case where it may be 3 or more objects. So long as that is never the case, I think you're fine.

However, let me quickly describe the problem that could occur with 3 objects. I'll use really simply pointer addresses to simplify the discussion. :)

' DatabaseAtomic *a, *b, *c;

a = new DatabaseAtomic(); // assume address 2. b = new DatabaseAtomic(); // assume address 3. c = new DatabaseAtomic(); // assume address 4.

Thread 1: 'if (a < b) { lock(a); if (b > c) { lock(b); lock(c); // <- deadlock } } '

Now, concurrently in Thread 2 -- right after 'if (b > c)' happens:

' free(c); c = new DatabaseAtomic(); // now it gets address 1!

if (c < a) { lock(c); if (c < b) { lock(b); <- deadlock ... '

If your system doesn't handle locking more than two 'DatabaseAtomic' objects, I think you're fine. However, my recommendation is that even if it does only use two today, that code might break if your system is updated in the future to handle more fine-graining locking.

My gentle recommendation is to instead change:

'if (this < &otherdb)'

To:

'if (this->uid < otherdb.uid)'

Which I believe is guaranteed to work under any number of objects, so long as the 'uid' is monotonically increasing at object construction time.

That said, feel free to ignore me and do whatever you believe is most appropriate.

Thanks for the great discussion! Really fun situation that ControlFlag found. :)

Best, The ControlFlag Team

alexey-milovidov

comment created time in 2 months

issue commentIntelLabs/control-flag

I've tried it with ClickHouse and it did not find anything meaningful.

To be a little bit more explicit, unless ClickHouse has built it's own memory manager with control of its own free store, I'm not sure you can reliably use pointer addresses to determine allocation timestamp ordering (i.e., monotonically increasing values).

A simple example might be:

  1. Alloc (let's pretend it's address 8 in a one byte allocation system)
  2. Free (7)
  3. Free (6)
  4. Alloc (6)

In this example, the memory allocated in operation 4 has a smaller pointer address value than the memory allocated in operation 1. Yet, operation 4 was constructed at a later timestamp (monotonically increasing) than operation 1.

So if the intention here is to use memory values to order things based on timestamp allocation, using the pointer address is (unless controlled by some other mechanism) not guaranteed to properly maintain such monotonically increasing order.

Does this make sense?

Best, The ControlFlag Team

alexey-milovidov

comment created time in 2 months

issue commentIntelLabs/control-flag

I've tried it with ClickHouse and it did not find anything meaningful.

Hi @alexey-milovidov -

To reiterate what @nhasabni was saying, shouldn't the check be the following?

  1. "else if (*this < other_db)"

Rather than what it currently is?

  1. "else if (this < &other_db)"

If I understand the code correctly, the first check (1) will dereference this pointer to get the object at the "this" address and because the code will now be operating on two concrete objects, the operator<() will be invoked for *this and other_db.

The second check (which is what is currently in the code) (2) will take the address of other_db, which will result in two pointer comparisons. That, I think, will invoke an integer < operation, which will simply return true of "this" pointer address is smaller than &other_db. Except, it's not immediately clear why or how that operation will provide any meaningful information.

Maybe we're just misunderstanding something?

Thanks, The ControlFlag Team

alexey-milovidov

comment created time in 2 months

issue commentIntelLabs/control-flag

[FEATURE]Support for Non-Control structures

@jiangwei007 - thanks for the excellent feedback. We agree with your comment wholeheartedly.

When we first designed ControlFlag, we did so with the intentional focus on only analyzing control expressions because we had seen some studies showing a large majority of defects emerging from them. However, as you note, ControlFlag should work on all programming structures and patterns, not just control expressions.

This is a wonderful feature request; I'll discuss with the team, but we will definitely do this. I'm just not sure how quickly we'll be able to get to it as some of the other features might be higher priority (e.g., ControlFlag supporting the Windows operating system).

Best, The ControlFlag Team

jiangwei007

comment created time in 2 months

PullRequestReviewEvent

Pull request review commentIntelLabs/control-flag

Change github download url list filename

 statements that appear in C programs.*  If you want to use your own repository for mining patterns, jump to Step 1.2. -1.1 __Downloading Top-100 GitHub repos for C language__+1.1 __Downloading Top-1000 GitHub repos for C language__ -Steps below show how to download Top-100 GitHub repos for C language-(`c100.txt`) and generate training data. `training_repo_dir` is a directory+Steps below show how to download Top-1000 GitHub repos for C language

Oh, right. Thanks for clarifying @nhasabni. That sounds rights to me.

MatheMatrix

comment created time in 2 months

pull request commentIntelLabs/control-flag

Change github download url list filename

Looks reasonable to me; approved. Waiting for @nhasabni to approve as well.

Thank you @MatheMatrix for your correction and contribution!

Justin

MatheMatrix

comment created time in 2 months

pull request commentIntelLabs/control-flag

Change github download url list filename

Seems reasonable to me. Adding @nhasabni for his approval, too.

MatheMatrix

comment created time in 2 months

issue commentIntelLabs/control-flag

[FEATURE]Support for the Cpp programming language

Hi @jiangwei007 - right now the parser is setup to only scan .h and .c files, however, you can modify that to include .cpp files. @nhasabni - can you point @jiangwei007 to the modification point in the script?

The reason why we don't add .cpp files by default is because we haven't yet added C++-specific learning to the system. However, you may still be able to find issues in your code in C++ due to C++ being somewhat of a superset of C. That said, once we've added proper C++ learning, we anticipate the results may be much better.

Best, The ControlFlag Team

jiangwei007

comment created time in 2 months

fork jgottschlich/control-flag

A system to flag anomalous source code expressions by learning typical expressions from training data

fork in 3 months

fork jgottschlich/MICSAS

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

fork in 3 months

PR closed IntelLabs/control-flag

Reviewers
Updating to support .hpp files automatically

Updating script to support .hpp files automatically, without requiring user specification.

+1 -1

1 comment

1 changed file

jgottschlich

pr closed time in 3 months

pull request commentIntelLabs/control-flag

Updating to support .hpp files automatically

Discussed with @nhasabni - we have a plan to get this done properly in the next week or so. So closing this pull request.

jgottschlich

comment created time in 3 months

PR opened IntelLabs/control-flag

Reviewers
Updating to support .hpp files automatically

Updating script to support .hpp files automatically, without requiring user specification.

+1 -1

0 comment

1 changed file

pr created time in 3 months

create barnchIntelLabs/control-flag

branch : jgottschlich-patch-1

created branch time in 3 months

issue commentIntelLabs/control-flag

Is it possible to mine java pattern?

Hi @sluk3r -

Sadly, that's correct, we currently don't support Java, but it's nearly the top of our list of growing language support (e.g., Python, JavaScript, & Java). As you probably know, ControlFlag was designed to be language agnostic, so it's likely just a matter of us adding support in our parser for Java.

Hopefully in the next few months, we'll have support for these languages. We're working around the clock to grow our supported languages. :)

Thank you for your interest in ControlFlag and stay tuned! We'll likely have Java support soon!

Thanks, The ControlFlag Team

sluk3r

comment created time in 3 months

startedjgottschlich/AutoPerf

started time in 3 months

PR opened IntelLabs/control-flag

Reviewers
Update README.md

Minor tweak in the language explaining the new smaller models in possibly reducing accuracy and increasing the number of false positives CF may generate.

+1 -1

0 comment

1 changed file

pr created time in 3 months

create barnchIntelLabs/control-flag

branch : jgottschlich-patch-1

created branch time in 3 months

pull request commentIntelLabs/control-flag

Added multiple versions of training dataset

Looks excellent; thanks Niranjan. @all - please note that these new smaller sized models may not have the accuracy as the larger model. However, they may make running ControlFlag on a lower powered device, such as a personal laptop possible.

We still recommend using the large model, if you can afford the computational overhead, as it current seems to have the best accuracy. We'll continue to try to improve the accuracy and reduce the number of false positives of ControlFlag as we move forward.

Thanks for looking at ControlFlag! Justin

nhasabni

comment created time in 3 months

push eventIntelLabs/control-flag

Niranjan Hasabnis

commit sha 6bc17348cb422f9b0b1529e72bbf80d394a25d26

Added multiple versions of training dataset

view details

Justin Gottschlich

commit sha 2cafc2bff761fae0f12799a13272e4f5dd6e5f52

Merge pull request #12 from IntelLabs/nhasabni/readme_datasets Added multiple versions of training dataset

view details

push time in 3 months

more