Patent-to-patent textual similarity (Data access)

In a joint project with Ken Younge, we introduce a new, continuous measure of technological similarity based on a vector space model. We apply the model to calculate the pairwise similarity for more than 14 trillion pairs of patents. We validate the measure and demonstrate that it can provide greater accuracy, specificity, and generality than existing approaches for many common research questions. Moreover, we illustrate how a pairwise similarity comparison of any and every two patents in the USPTO patent space can open new avenues of research in economics, management, and public policy. The data will be made accessible via the Patent Research Foundation.

Examiner toughness (Data download)

In a joint project with Neil Thompson, we introduce a new measure of patent examiner stringency based on textual analysis of the claims of more than 1 million U.S. patents. The measure, normalized to produce a z-score having a mean of zero and a standard deviation of one, indicates for each patent the degree of stringency of stringency of the patent’s examiner relative to other examiners in the same area of technology. We currently make a sample of the data available for download and plan to release additional data in the near future.