H-index

medium
tag: hashmap, binary search, 映射
similar: 九章的indexed priority Queuee, Princeton的Dijkstra algs

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

For example, given citations = [3, 0, 6, 1, 5], which means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, his h-index is 3.

思路

首先理解这个h-index的定义. 一开始就有点头晕, 因为给的例子里面不是有3,1,0, 一共3个数no more than 3吗? 那不就不符合n-h的要求? Ans: 其实并没有, 因为按照定义: 只要有h个文章的citation是不小于h, 然后有n-h个文章的citation不多于h就行了.
我想的方法是: 排序, 然后比citation[i]的文章个数为citation.lenght-i, 所以就是找Max(Min(citation[i], citation.length-i)).

idea 1: O(NlogN)使用wiki的公式

直接使用上面的思路.
代码如下:

public int hIndexOnlogn(int[] citations) {
    Arrays.sort(citations);
    int h = 0;
    for(int i = 0; i < citations.length; i++){
      // h-ind until now
      int currH = Math.min(citations[i], citations.length - i);
      if(currH > h){
        h = currH;
      }
    }
    return h;
}

idea 2: O(N)使用array mapping, 类似hashmap

牛X的方法就是: 使用一个aux数组统计每一个引用数的文章个数, 然后从大到小的找, 利用2个性质:
1. 定义是: h个文章的citation不小于h.
2. h <= citation.length.
这里使用1-base的aux array记录, 计算h的同时判断h是否大于等于i了.
代码如下:

public int hIndexOn(int[] citations) {
    int sum = 0;
    int sz= citations.length;
    int[] A = new int[sz+1];
    for (int i = 0; i < sz; ++i) {
        A[citations[i] > sz ? sz : citations[i]] += 1;
    }

    for (int i = sz; i > 0; i--) {
        sum += A[i];
        if (sum >= i) {
            return i;
        }
    }
    return 0;
}

H-index II

如果题目给的citations数组已经排好序的话, 有什么优化的算法呢?

这就和我一开始的想法一样了, 既然是找最大的index使得citations[i]

反思

这里的O(N)解法很好, 相当于是hashmap, 或者说是映射. 已知条件是每一个paper的引用数, 这里aux数组记录的是每个引用数的文章个数. 完美的解决问题.
很类似的也包括常见的index-piority queue.
这里的方法要base在对于h-index的定义的理解. 类似的题目还有LIS的O(nlogn)解法, 也是要理解好题意, 才想的到那种算法.