Information Technology Reference

In-Depth Information

While the above work uses the concept of the VC dimension for general-

ization analysis, the notion of the bipartite rank-shatter coefficient, denoted as

r(
F
,m
+
,m
−
)
,isusedin[
2
]. This new notion has a similar meaning to the shatter-

ing coefficient in classification [
16
,
17
].

Using the bipartite rank-shatter coefficient as a tool, the following theorem has

been proven in [
2
]. For the class of linear scoring functions in the one-dimensional

feature space,
r(

,m
+
,m
−
)
is a constant, regardless of the values of
m
+

F

and

m
−
. In this case, the bound converges to zero at a rate of
O(
max

1

√
m
+

1

{

√
m
−
}

)
,

and is therefore tighter than the bound based on the VC dimension (as given in

Theorem
17.1
). For the class of linear scoring functions in the
d
-dimensional fea-

ture space (
d>
1),
r(

,

,m
+
,m
−
)
is of the order
O((m
+
m
−
)
d
)
, and in this case

the bound
has a s
im
ilar con
vergence rate to that based on the VC dimension, i.e.,

O(
max

F

log
(m
+
)

m
+

,
log
(m
−
)

m
−

{

}
)
.

F

X

Theorem 17.2
Let

be the class of real-valued functions on

,
then with prob-

ability at least
1

−
δ
(0
<δ<
1),

8
(m
+
+

m
−
)(
log
δ
+

R
0
(f )

−
R
0
(f )
≤

,
2
m
+
,
2
m
−
))

log
r(

F

∀

f

∈
F

,

.

m
+
m
−

(17.2)

Note that the above two results are only applicable to the case of bipartite ranking.

Some other researchers have extended these results to more general cases, e.g., the

case of
k
-partite ranking [
9
,
15
].

Here we take [
9
] as an example. In this work, the labels of documents are no

longer assumed to be positive or negative. Instead, a document is preferred to an-

other one if and only if the label of the former document is larger than that of the

latter document. Specifically, given two documents
x
u
and
x
v
whose labels are
y
u

and
y
v
, the pairwise 0-1 loss is defined as
I
{
(y
u
−
y
v
)(f (x
u
)
−
f(x
v
))<
0
}

. In other words,

if the ranking result given by function
f
is in the same order of that given by the

ground-truth label, the loss is zero; otherwise the loss is one. Based on the pairwise

0-1 loss, the expected risks and empirical risks are defined as below.

R
0
(f )

=

E

[

I

}
]

,

(17.3)

{

(y
u

−

y
v
)(f (x
u
)

−

f(x
v
))<
0

m

m

2

m(m

R
0
(f )

=

I

.

(17.4)

{

(y
u
−

y
v
)(f (x
u
)

−

f(x
v
))<
0

}

−

1
)

u

=

1

v

=

u

+

1

Using the properties of U-statistics [
9
], the following generalization bound is

obtained with respect to the aforementioned empirical and expected risks.