profile
viewpoint
Niranjan Hasabnis nhasabni @Intel-Corporation Santa Clara

IntelLabs/control-flag 998

A system to flag anomalous source code expressions by learning typical expressions from training data

IntelLabs/MICSAS 7

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

nhasabni/models 1

Model Zoo for Intel® Architecture: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors

nhasabni/tree-sitter 1

An incremental parsing system for programming tools

nhasabni/control-flag 0

A system to flag anomalous source code expressions by learning typical expressions from training data

nhasabni/Hackathon 0

MSR Hackathon 2022

nhasabni/io 0

Io programming language. Inspired by Self, Smalltalk and LISP.

nhasabni/lshmatmul-op 0

Guide for building custom op for TensorFlow

nhasabni/python-lint 0

GitHub Action for Lint your code

issue commentchaoss/grimoirelab-perceval

[Question] Is there a perceval API for obtaining number of commits and contributors to a GitHub repo?

@zhquan @jgbarah Thanks for response. Got that the default number of items per page (30) was making the difference!

nhasabni

comment created time in a month

startedIntelLabs/control-flag

started time in a month

fork nhasabni/control-flag

A system to flag anomalous source code expressions by learning typical expressions from training data

fork in a month

startedchaoss/grimoirelab-graal

started time in a month

issue commentchaoss/grimoirelab-graal

Graal does not honor environment variable

Hi @nhasabni,

But Graal does not:

The git executable path is hardcoded here: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/graal.py#L50. The code should be modified to accept the git executable path via the command line.

Please try to create a soft link, it worked for me

  • install git latest version
cd /home/valcos/Desktop
git clone https://github.com/git/git
sudo apt-get install tcl build-essential tk gettext
sudo apt-get install libcurl4-gnutls-dev
sudo apt-get install libssl-dev
make configure
  • check git version
git --version
git version 2.25.1
  • create soft link and backup old executable
sudo cp /usr/bin/git /usr/bin/git.backup
sudo rm /usr/bin/git
sudo ln -s /home/valcos/Desktop/git/git /usr/bin/git
  • check git version
git --version
git version 2.34.1.75.gabe6bb3905
  • execute cocom backend
graal cocom https://github.com/chaoss/grimoirelab-perceval --git-path /tmp/graal-cocom
  • restore old executable
sudo rm /usr/bin/git
sudo cp /usr/bin/git.backup /usr/bin/git

Hope it helps!

Thanks @valeriocos for looking into this. Unfortunately, --exec-path command line param does not help.

$ graal cocom https://github.com/chaoss/grimoirelab-perceval --git-path /tmp/graal-cocom --exec-path ~/bin/git
[2021-12-03 10:25:39,906] - Starting the quest for the Graal.
[2021-12-03 10:25:39,921] - Error!: git command - git: 'worktree' is not a git command. See 'git --help'.

[2021-12-03 10:25:39,921] - Quest completed.

Also, I do not have sudo access in the machine that I am using for setup. So I cannot create a softlink for /usr/bin/git. I think I will modify the Graal script to use path of git binary for my environment. Thanks.

nhasabni

comment created time in a month

push eventnhasabni/Hackathon

k----n

commit sha 9258247e8d3393110a0b716b2914dc6b0d0c371e

Added initial project description

view details

Jesus M. Gonzalez-Barahona

commit sha fce3f8a86f2c286bd93eb5165e83853de085b7a9

Merge pull request #14 from k----n/GrimoireGitter Added initial project description for GrimoireGitter

view details

Niranjan Hasabnis

commit sha 03636903f4037fd79428f871104ad33e9da11906

Merge branch 'main' into main

view details

push time in 2 months

push eventnhasabni/Hackathon

Niranjan Hasabnis

commit sha c0f08fb7e2cdce8dc0a9f3c939ace7a91a6d99bf

Update teams.md

view details

push time in 2 months

issue openedchaoss/grimoirelab-graal

Graal does not honor environment variable

I am using a different version of git than the one installed on the system by default. Newer version of the git is installed in ~/bin and PATH variable is set to ~/bin. Shell command for git picks it up correctly as:

$ which git
~/bin/git

But Graal does not:

$ PATH=~/bin:${PATH} graal -g cocom https://github.com/chaoss/grimoirelab-perceval --git-path /tmp/graal-cocom
[2021-12-02 11:58:14,207 - root - INFO] - Starting the quest for the Graal.
[2021-12-02 11:58:14,211 - perceval.backends.core.git - DEBUG] - Running command /usr/bin/git worktree add /tmp/worktrees/graal-cocom (cwd: /tmp/graal-cocom, env: {'LANG': 'C', 'PAGER': '', 'HTTP_PROXY': 'http://<proxy>:911/', 'HTTPS_PROXY': 'http://<proxy>:912/', 'NO_PROXY': '', 'HOME': '/home/nhasabni'})
[2021-12-02 11:58:14,224 - perceval.backend - ERROR] - Error!: git command - git: 'worktree' is not a git command. See 'git --help'.

[2021-12-02 11:58:14,224 - root - INFO] - Quest completed.

worktree is part of git installed under ~/bin:

$ git worktree list
/tmp/worktrees/graal-cocom                          2e6a58b [graal-cocom] prunable
$ /usr/bin/git worktree list
git: 'worktree' is not a git command. See 'git --help'.

created time in 2 months

PR opened MSRHack2022/Hackathon

Update teams.md
+3 -8

0 comment

1 changed file

pr created time in 2 months

push eventnhasabni/Hackathon

Niranjan Hasabnis

commit sha f04f317c0069b7e80dfd7bb170deccf516db10b4

Update teams.md

view details

push time in 2 months

fork nhasabni/Hackathon

MSR Hackathon 2022

fork in 2 months

issue commentchaoss/grimoirelab-perceval

[Question] Is there a perceval API for obtaining number of commits and contributors to a GitHub repo?

If you only need the number of commits for a set of repos, the GitHub API is much more efficient, as you say. But in that case, you can also consider other solutions such as https://ghtorrent.org/

If you want (and talking in terms of the MSR hackathon) you could also add a call to that GitHub API to the Perceval github metadata backend to get richer data, or write a new backend, which is not that difficult.

Yes, I am using GitHub APIs now. I'm seeing a weird behavior when I use GitHubClient of github backend. Here is the code that I am trying:

github_client = GitHubClient(owner="chaoss", repository="grimoirelab-perceval", tokens=[my_token])
resource_url = <url_that_I_want>
response = github_client.fetch(url=resource_url)

For slightly different values of resource_url, I am seeing different values in response.header.

For resource_url = "https://api.github.com/repos/chaoss/grimoirelab-perceval/issues", I see that response.headers contain Link field which points to the number of pages. Value is 'Link': '<https://api.github.com/repositories/47415120/issues?page=2>; rel="next", <https://api.github.com/repositories/47415120/issues?page=3>; rel="last"'

While if I change resource_url slightly such as https://api.github.com/repos/chaoss/grimoirelab-perceval/issues?accept=application/vnd.github.v3+json&state=open&since=2021-06-04T23:10:45Z, I don't see Link field in response.header, although response.status is 200.

Can you guide me as to what could be going wrong? All the URLs work correctly in a browser.

Correct response is as below:

>>> resource_url = "https://api.github.com/repos/chaoss/grimoirelab-perceval/issues"
>>> response = github_client.fetch(url=resource_url)
>>> print(response.headers)
{'Server': 'GitHub.com', 'Date': 'Wed, 01 Dec 2021 23:20:37 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Cache-Control': 'private, max-age=60, s-maxage=60', 'Vary': 'Accept, Authorization, Cookie, X-GitHub-OTP, Accept-Encoding, Accept, X-Requested-With', 'ETag': 'W/"d2adea4cd688d0de59460af12e086c10ba1f43941f5b990b0a9630e298b6af36"', 'X-OAuth-Scopes': 'admin:enterprise, admin:gpg_key, admin:org, admin:org_hook, admin:public_key, admin:repo_hook, delete:packages, delete_repo, gist, notifications, repo, user, workflow, write:discussion, write:packages', 'X-Accepted-OAuth-Scopes': 'repo', 'github-authentication-token-expiration': '2022-01-14 16:12:49 UTC', 'X-GitHub-Media-Type': 'github.v3; param=squirrel-girl-preview', 'Link': '<https://api.github.com/repositories/47415120/issues?page=2>; rel="next", <https://api.github.com/repositories/47415120/issues?page=3>; rel="last"', 'X-RateLimit-Limit': '5000', 'X-RateLimit-Remaining': '4932', 'X-RateLimit-Reset': '1638401725', 'X-RateLimit-Used': '68', 'X-RateLimit-Resource': 'core', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '0', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Content-Encoding': 'gzip', 'X-GitHub-Request-Id': 'E780:7E1D:12752A6:31B1DE1:61A80345'}
nhasabni

comment created time in 2 months

issue commentchaoss/grimoirelab-perceval

[Question] Is there a perceval API for obtaining number of commits and contributors to a GitHub repo?

For getting medatadata from the git repository in a GitHub repo, you can use the Perceval git backend. Or are you asking for something else?

I looked into Perceval's git backend. Will it fetch all the commits for a repository? I am looking for getting a count of the number of commits. It looks like GitHub's REST API for commits here is more efficient in a sense that it only fetches 30 (default) commits with a link to the last page. That allows me to determine the number of commits easily, without fetching all the commits.

nhasabni

comment created time in 2 months

issue openedchaoss/grimoirelab-perceval

[Question] Is there a perceval API for obtaining number of commits and contributors to a GitHub repo?

GitHub backend for perceval returns issues, repository metadata, and pull requests, but does not have a specific API to get commits and contributors count. Am I expected to use fetch API here? Thanks.

created time in 2 months

issue commentIntelLabs/control-flag

[BUG] Limited to 16 threads? Missing logfiles?

The number of scanning threads are limited by default to the number of CPUs on the system. Thanks for your reply.

But it ran on a 20 logical core cpu? I'll rerun on our 80-core machine tomorrow.

Regards

Yes, we use nproc utility to obtain the number of CPUs on the system, which as per our knowledge, returns the number of logical CPUs.

xback

comment created time in 2 months

issue commentIntelLabs/control-flag

Segmentation fault while scan_for_anomalies.sh

Hi @xback, Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

Hi, The test ran on a system with 1TB of RAM (really) of which >900GB was free.

Thanks for info, @xback. Let us look into reproducing the issue. Would you mind pointing us the repository that you have been scanning using ControlFlag (if it is a public repository)? That can help us expedite the process. Thanks.

Hi @xback, we scanned ClickHouse code using large version of the dataset, and the scan finished without any issues. In short, we do not see crash on our end. Please provide us a reproducer as per your convenience. Thanks.

qoega

comment created time in 2 months

issue commentIntelLabs/control-flag

Segmentation fault while scan_for_anomalies.sh

Hi @xback, Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

Hi, The test ran on a system with 1TB of RAM (really) of which >900GB was free.

Thanks for info, @xback. Let us look into reproducing the issue. Would you mind pointing us the repository that you have been scanning using ControlFlag (if it is a public repository)? That can help us expedite the process. Thanks.

qoega

comment created time in 2 months

issue commentIntelLabs/control-flag

[BUG] Limited to 16 threads? Missing logfiles?

Hi @xback,

The number of scanning threads are limited by default to the number of CPUs on the system. As you found, one can control the number using -j option.

The scanning threads are internally divided into several groups (e.g., file handler or logger threads) - this is to ensure that the system is not overwhelmed. That is why the number of log files generated could be less than the number of scanner threads. So it is an expected behavior.

Thanks.

-- the ControlFlag team

xback

comment created time in 2 months

issue commentIntelLabs/control-flag

[FEATURE]

Thanks for the request. We will add this to the list of interesting features.

Danc2050

comment created time in 2 months

issue commentIntelLabs/control-flag

Segmentation fault while scan_for_anomalies.sh

Hi @xback,

Thanks for trying out ControlFlag. Did you try using a smaller version of the dataset? We have seen that most of these crash bugs are because of using larger datasets than the available memory on the system. Thanks.

qoega

comment created time in 2 months

startedMSRHack2022/Hackathon

started time in 2 months

issue commentMSRHack2022/Hackathon

Participation in MSR Hackathon: MPR

@jgbarah Sorry for delay in responding. I am interested to participate. I've filled the form. Can you tell me my next steps?

nhasabni

comment created time in 2 months

delete branch IntelLabs/control-flag

delete branch : nhasabni/git_download_nonexistent_repo

delete time in 2 months

issue commentIntelLabs/control-flag

[BUG] Authentication Error, Not Handled Correctly

Hi @Danc2050,

Thanks for the report and analysis. We found that some of the repositories are non-existent (as you also found), and for some reason, git clone asks for credentials for such repositories. Different versions of git seem to have different mechanisms to deal with this problem, but we found that adding -c core.askPass=echo to git clone command helps with the problem. For non-existent repositories, it will simply report an error and continue rather than waiting for username/password.

PR #34 should fix this issue. Do you want to give it a try? Let us know.

Thanks, The ControlFlag team

Danc2050

comment created time in 2 months

PR opened IntelLabs/control-flag

Handle git clone of non-existent repos

git clone command when given a non-existent repo asks for user credentials (not sure why). This creates problems for automated download/clone of repositories for ControlFlag. By adding "-c core.askPass=echo" option to git clone, we can bypass these prompts, and instead we will get a verbose error like below:

$ python3 download_repos.py -f failed.c100.txt -o training_repo_dir -m clone -p 1 Number of repos: 100 19%|███████████████ | 19/100 [01:08<02:23, 1.77s/it] remote: Support for password authentication was removed on August 13, 2021. Please use a personal access token instead. remote: Please see https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/ for more information. fatal: Authentication failed for 'https://github.com/craSH/socat/'

+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchIntelLabs/control-flag

branch : nhasabni/git_download_nonexistent_repo

created branch time in 2 months

created tagIntelLabs/control-flag

tagv1.0

A system to flag anomalous source code expressions by learning typical expressions from training data

created time in 2 months

release IntelLabs/control-flag

v1.0

released time in 2 months

more