H-index
- medium
- tag: hashmap, binary search, 映射
- similar: 九章的indexed priority Queuee, Princeton的Dijkstra algs
- According to the definition of h-index on Wikipedia:
"A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."
- For example, given citations = [3, 0, 6, 1, 5], which means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, his h-index is 3.
思路
- 首先理解这个h-index的定义. 一开始就有点头晕, 因为给的例子里面不是有
3,1,0
, 一共3个数no more than 3吗? 那不就不符合n-h的要求? Ans: 其实并没有, 因为按照定义: 只要有h个文章的citation是不小于h, 然后有n-h个文章的citation不多于h就行了. - 我想的方法是: 排序, 然后比citation[i]的文章个数为
citation.lenght-i
, 所以就是找Max(Min(citation[i], citation.length-i))
.
idea 1: O(NlogN)使用wiki的公式
- 直接使用上面的思路.
- 代码如下:
public int hIndexOnlogn(int[] citations) {
Arrays.sort(citations);
int h = 0;
for(int i = 0; i < citations.length; i++){
// h-ind until now
int currH = Math.min(citations[i], citations.length - i);
if(currH > h){
h = currH;
}
}
return h;
}
idea 2: O(N)使用array mapping, 类似hashmap
牛X的方法就是: 使用一个aux数组统计每一个引用数的文章个数, 然后从大到小的找, 利用2个性质:
- 定义是: h个文章的citation不小于h.
- h <= citation.length.
这里使用1-base的aux array记录, 计算h的同时判断h是否大于等于i了.
- 代码如下:
public int hIndexOn(int[] citations) {
int sum = 0;
int sz= citations.length;
int[] A = new int[sz+1];
for (int i = 0; i < sz; ++i) {
A[citations[i] > sz ? sz : citations[i]] += 1;
}
for (int i = sz; i > 0; i--) {
sum += A[i];
if (sum >= i) {
return i;
}
}
return 0;
}
H-index II
如果题目给的citations数组已经排好序的话, 有什么优化的算法呢?
- 这就和我一开始的想法一样了, 既然是找最大的index使得citations[i]
反思
- 这里的O(N)解法很好, 相当于是hashmap, 或者说是映射. 已知条件是每一个paper的引用数, 这里aux数组记录的是每个引用数的文章个数. 完美的解决问题.
- 很类似的也包括常见的index-piority queue.
- 这里的方法要base在对于h-index的定义的理解. 类似的题目还有LIS的O(nlogn)解法, 也是要理解好题意, 才想的到那种算法.