Confidential data of users and limited metadata of programs and reports accessible via GraphQL
Discovered by yashrs on Security

This issue took 0 Days and 11 hours to triage and 0 Days and 3 hours to resolve once triaged.

On January 31st, 2019 at 7:16pm PST, HackerOne confirmed that two reporters were able to query confidential data through a GraphQL endpoint. This vulnerability was introduced on December 17th, 2018 and was caused by a backend migration to a class-based implementation of GraphQL types, mutations, and connections. The <a href="/redirect?signature=db4e87bcbbbd6cbc9c9b4bb7bf7e35103262b4ec&amp;" target="_blank" rel="nofollow noopener noreferrer"><span>class-based implementation introduced</span><i class="icon-external-link"></i></a> the <code>nodes</code> field by default on all connections. The <code>nodes</code> field, in contrast with <code>edges</code>, didn’t leverage any of the defenses HackerOne has implemented to mitigate the exposure of sensitive information.

Our investigation concluded that malicious actors did not exploit the vulnerability. No confidential data was compromised. A short-term fix was released on January 31st, 2019 at 9:46 PM, a little over 2 hours after the vulnerability was reproduced.

<h1 id="timeline">Timeline</h1>

<table> <thead> <tr> <th><strong>Date</strong></th> <th><strong>Time (PST)</strong></th> <th><strong>Action</strong></th> </tr> </thead> <tbody> <tr> <td>2018-12-17</td> <td>9:07 AM</td> <td>Software containing bug deployed to production.</td> </tr> <tr> <td>2019-01-31</td> <td>7:32 AM</td> <td>Vulnerability submitted to HackerOne’s bug bounty program.</td> </tr> <tr> <td>2019-01-31</td> <td>7:21 PM</td> <td>HackerOne validated the report and started incident response.</td> </tr> <tr> <td>2019-01-31</td> <td>8:25 PM</td> <td>HackerOne identified which code change introduced the security vulnerability and started work on a patch.</td> </tr> <tr> <td>2019-01-31</td> <td>9:46 PM</td> <td>A patch was released mitigating the identified vulnerability.</td> </tr> <tr> <td>2019-01-31</td> <td>11:46 PM</td> <td>HackerOne confirmed the vulnerability was not abused by any malicious actors.</td> </tr> <tr> <td>2019-02-01</td> <td>6:18 AM</td> <td>The root cause of the vulnerability was identified and a long term mitigation was proposed.</td> </tr> <tr> <td>2019-02-01</td> <td>5:08 PM</td> <td>Long term mitigation was deployed to production.</td> </tr> <tr> <td>2019-02-03</td> <td>2:34 AM</td> <td>Impacted users were alerted that their information was exposed to the reporters who submitted the vulnerability.</td> </tr> </tbody> </table>

<h1 id="root-cause">Root Cause</h1>

HackerOne has a number of defenses in place to reduce the risk of over-exposing data through our GraphQL layer. The first notable defense is a separate database schema that limits the set of rows a user can query based on their current role. This significantly reduces the impact in case, for example, the result of <code>Report.all</code>, would be serialized and returned to the user. The second notable defense is attribute-level authorization depending on the role of the requester. This makes sure that when an object is serialized, for example a publicly disclosed report, the user is not able to obtain internal metadata of the report.

<em>Why upgrade?</em><br> On December 17th, when the code change was put up for review, engineers noticed the addition of the <code>nodes</code> field. An assumption was made that the field behaved like a shortcut for <code>edges { node }</code> — which, in hindsight, was not the case. No manual testing was performed to make sure that the authorization model for <code>nodes</code> was similar to other connection types.

HackerOne’s engineering team decided to upgrade to the class-based implementation of <code>graphql-ruby</code> because the old .define-based implementation was lazy-loaded. This caused problems when hot reloading pieces of code in a development environment. The class-based implementation also performs better in most situations. The .define-style implementation is also deprecated by the maintainers of the gem (to be removed with GraphQL 2.0).

<em>Why didn’t we notice?</em><br> The <code>nodes</code> field is a helper field for Relay, which is used by the frontend. Even though the field was introduced, HackerOne engineers hadn’t started using this in our frontend. This caused the addition to fly under the radar of other engineers. The go-to way to query data through connection types at HackerOne is to go through the <code>edges</code> field. Because engineers outside of the specific team who upgraded to the class-based implementation did not deem the change important enough, there was no communication to other engineering teams.

<em>Why was it exploitable?</em><br> When a GraphQL query is deconstructed and turned into one or multiple SQL queries, it will cast the result of it into an array of stale objects and use the attribute-level authorization to scrub all data the current user isn’t authorized to see. Root cause analysis showed that this code path was only followed when the nodes were queried through the <code>edges</code> field.

<strong>Query that followed the expected code path</strong> <pre class="highlight plaintext"><code>query { users() { edges { node { email } } } } </code></pre> During the GraphQL gem upgrade on December 17th, all GraphQL types, connections, and mutations were rewritten to a class-based implementation. This introduced the <code>nodes</code> field on every connection type <a href="/redirect?signature=3d6420990aa63c4fb0b9cd48e3df3dc38d569fcd&amp;" target="_blank" rel="nofollow noopener noreferrer"><span>in HackerOne’s GraphQL schema</span><i class="icon-external-link"></i></a>. Instead of casting the result to an array with stale objects, the <code>nodes</code> field would result in an <code>ActiveRecord::Relation</code> object. The attribute-level authorization instrumentation would then incorrectly assume that the result was safe to be serialized, as it assumes the parent of the GraphQL field had already been scrubbed.

<strong>Query that followed the unexpected code path</strong> <pre class="highlight plaintext"><code>query { users() { nodes { email } } } </code></pre> In the team’s investigation to determine whether this was exploited by malicious actors, the team concluded that the current logging level enabled them to answer two crucial questions: which GraphQL queries were executed and what information was transferred to the people proving the security vulnerability in the first place. These questions confirmed it was not exploited.

<h1 id="resolution-and-recovery">Resolution and Recovery</h1>

At 7:21 PM PST, HackerOne successfully reproduced the vulnerability as described by the reporter. The responding team identified the code change that introduced the vulnerability and started working on a short-term mitigation at 8:25 PM. This mitigation was released at 9:46 PM. The short-term mitigation was to disable the <code>nodes</code> field <a href="/redirect?signature=740e67ffb85b834c8d4ac03a123a594ab90a3465&amp;" target="_blank" rel="nofollow noopener noreferrer"><span>from every connection type</span><i class="icon-external-link"></i></a>. An internal code rule was deployed to alert the incident responders in case a new connection type was added that had the <code>nodes</code> field enabled. At the time, the root cause of the vulnerability was still unclear.

On February 1st at 6:18 AM, the team concluded the root cause analysis of the identified vulnerability. A long-term fix was put up for discussion. This fix addressed the underlying problem of the lack of attribute-level protection for the <code>nodes</code> field. Going forward any connection type that is introduced will either be sanitized through the attribute-level authorization or will stop processing the request in case of an unexpected object to be returned.

The minimum bounty award for a critical vulnerability on is currently set to $15,000. Even though this vulnerability exposed confidential information, it was limited to user information and metadata of programs and reports. None of the exposed information could have led to the compromise of confidential vulnerability information. It did, however, allow actors to query a significant amount of information. Because of that, the team decided to award the reporters with $20,000 for uncovering this vulnerability and working with us throughout the process.

<h1 id="vulnerability-impact-on-data">Vulnerability Impact on Data</h1>

Sensitive information of multiple objects was exposed. Due to the two notable defenses as described in the Root Cause section, the scope of the information that was exposed was limited. Below is an overview of the objects and the confidential data that a user was able to access.

<em>Connection: users</em><br> The GraphQL schema enables anyone to query the users on the platform. This is an intentional design decision. However, because every User object could be accessed, a significant amount of confidential information was accessible.

Below is an overview of all sensitive attributes that could be queried for every user on

<table> <thead> <tr> <th><strong>Sensitive attribute</strong></th> <th><strong>Note</strong></th> </tr> </thead> <tbody> <tr> <td>account_recovery_phone_number</td> <td>The last two digits of a verified account recovery phone number.</td> </tr> <tr> <td>account_recovery_unverified_phone_number</td> <td>The complete unverified account recovery phone number.</td> </tr> <tr> <td>address</td> <td>Accessible when swag was awarded for a report the authenticated user had access to, regardless of their role (e.g. publicly disclosed report).</td> </tr> <tr> <td>calendar_token</td> <td>The secret calendar token that exposes when HackerOne challenges were scheduled for the user. <a href=";&gt;This does not expose customer names</a>.</td> </tr> <tr> <td>duplicate_users</td> <td>An array of possible duplicate accounts based on platform behavior.</td> </tr> <tr> <td>email</td> <td>The email address.</td> </tr> <tr> <td>otp_backup_codes</td> <td>An array of bcrypt-hashed OTP backup codes.</td> </tr> <tr> <td>payout_preferences</td> <td>A connection of the user’s payout preferences. This does <strong>not</strong> include bank account details.</td> </tr> <tr> <td>reports</td> <td>See Report connection for the scope and attributes that were exposed.</td> </tr> <tr> <td>unconfirmed_email</td> <td>The unconfirmed email address.</td> </tr> </tbody> </table>

<em>Connection: teams</em><br> The secure database schema, by default, allows any user to query public programs (teams) and public external programs. Because of the relationship between external programs and HackerOne programs, this data set includes programs who may be running a private program. This means it was possible to obtain internal triage notes and the policy of a select number of private programs the user did not have access to. The reporters queried partial program information, but they did not obtain any sensitive information that warranted HackerOne to reach out to any customers.

<table> <thead> <tr> <th><strong>Sensitive attribute</strong></th> <th><strong>Note</strong></th> </tr> </thead> <tbody> <tr> <td>average_bounty_lower_amount</td> <td>The lower bound of the average bounty range.</td> </tr> <tr> <td>average_bounty_upper_amount</td> <td>The higher bound of the average bounty range.</td> </tr> <tr> <td>base_bounty</td> <td>The minimum bounty of a program.</td> </tr> <tr> <td>bounties_total</td> <td>The sum of awarded bounties in the entire lifetime of the program.</td> </tr> <tr> <td>bug_count</td> <td>The total number of resolved reports.</td> </tr> <tr> <td>child_teams</td> <td>A connection containing the hierarchy of teams.</td> </tr> <tr> <td>first_response_time</td> <td>A float containing the average time to first response.</td> </tr> <tr> <td>goal_valid_reports</td> <td>The goal of valid vulnerabilities per month the program set.</td> </tr> <tr> <td>grace_period_remaining_in_days</td> <td>The number of days the program has to recover from too many SLA failures to avoid their program being taken off HackerOne.</td> </tr> <tr> <td>new_staleness_threshold</td> <td>The internal SLA until a report is marked as an SLA miss when it hasn’t received a first response.</td> </tr> <tr> <td>new_staleness_threshold_limit</td> <td>The internal SLA until a report is marked as an SLA fail when it hasn’t received a first response.</td> </tr> <tr> <td>policy</td> <td>The program policy in raw markdown.</td> </tr> <tr> <td>policy_html</td> <td>The rendered program policy.</td> </tr> <tr> <td>product_edition</td> <td>The product edition the program uses.</td> </tr> <tr> <td>report_submission_form_intro</td> <td>The submission form introduction in raw markdown.</td> </tr> <tr> <td>report_submission_form_intro_html</td> <td>The rendered submission form introduction.</td> </tr> <tr> <td>report_template</td> <td>The default report template in raw markdown.</td> </tr> <tr> <td>reporters</td> <td>An array of user objects who have reporter access to the program.</td> </tr> <tr> <td>resolution_time</td> <td>A float containing the average time to resolution.</td> </tr> <tr> <td>resolved_staleness_threshold</td> <td>The internal SLA until a report is marked as an SLA miss when it hasn’t been resolved.</td> </tr> <tr> <td>sla_failed_count</td> <td>The number of reports failing the internal SLA.</td> </tr> <tr> <td>structured_policy</td> <td>A structured representation of the program policy.</td> </tr> <tr> <td>structured_scopes</td> <td>A connection that only disclosed an internal <code>reference</code> in case the user was authorized to see the structured scopes on the program page.</td> </tr> <tr> <td>target_signal</td> <td>A float representing the targeted signal of the program.</td> </tr> <tr> <td>triage_bounty_management</td> <td>A text field containing instructions for HackerOne’s triage team on how to handle bounty payments.</td> </tr> <tr> <td>triage_enabled</td> <td>A boolean field indicating whether the program uses HackerOne’s triage services.</td> </tr> <tr> <td>triage_note</td> <td>Internal triage notes in raw markdown.</td> </tr> <tr> <td>triage_note_html</td> <td>The rendered triage notes.</td> </tr> <tr> <td>triage_time</td> <td>A float containing the average time to triage.</td> </tr> <tr> <td>triaged_staleness_threshold</td> <td>The internal SLA until a report is marked as an SLA miss when it hasn’t been triaged.</td> </tr> <tr> <td>triaged_staleness_threshold_limit</td> <td>The internal SLA until a report is marked as an SLA fail when it hasn’t been triaged.</td> </tr> <tr> <td>whitelisted_hackers</td> <td>See <code>reporters</code>.</td> </tr> </tbody> </table>

<em>Connection: reports</em><br> The reports data hasn’t been fully migrated to the secure database schema yet, which means that at the time the vulnerability was reported, only fully publicly disclosed and all reports the user participated in were accessible. This significantly reduced the number of report information that was exposed.

<table> <thead> <tr> <th><strong>Sensitive attribute</strong></th> <th><strong>Note</strong></th> </tr> </thead> <tbody> <tr> <td>anc_reasons</td> <td>An array of strings containing flags why the report was submitted to the HackerOne Human-Augmented Signal queue.</td> </tr> <tr> <td>mediation_requested_at</td> <td>A date/time field when mediation was requested.</td> </tr> <tr> <td>pre_submission_review_state</td> <td>A flag representing how Human-Augmented Signal responded to the report.</td> </tr> <tr> <td>reference</td> <td>An optional internal reference.</td> </tr> <tr> <td>reference_link</td> <td>An optional link to an internal ticket.</td> </tr> </tbody> </table>

Even though the reporters confirmed that they did not query more information than necessary to prove the vulnerability and that they have deleted the information, HackerOne has reached out to the people for which sensitive information was downloaded by the reporters.

<strong>If your data was accessed during this incident, you have received a separate notification from HackerOne.</strong>

<h1 id="preventative-measures">Preventative Measures</h1>

As part of our incident response process, we are conducting an internal review and analysis of the incident. We are taking the following actions to address the underlying causes of issues and to help prevent future occurrence:

<ul> <li>Consider leveraging the <code>graphql-ruby</code> gem hooks for built-in authorization callbacks to catch more edge cases</li> <li>Break the execution flow when an unexpected object is returned in the resolution of a connection field</li> <li>Reduce the complexity of connection type resolution</li> </ul>