Lucas theorem is quite well known for compute (mn)(modp):
Let p be a prime, n=u1p+u2,m=v2p+v2 where 0≤u2,v2<p, then we have
(mn)≡(v1u1)(v2u2)(modp).
However, Lucas theorem only works for prime p. For composites CRT can rescue us, but we still have to solve the hard case where p is a prime power, which this post mainly discusses. The post borrows a lot from a post by prabowo (although I guess he also references min_25’s blogpost), while having its own novelties (mostly the upper bound).
Andrew Granville also has a paper on Extended Lucas theroem. However, there are too many cases in the paper and the overall time complexity is also high. I’d never write it again after I did it once.
Analysis on n!
Define E(n)=maxk{pk∣n!} as the number of trailing 0’s of the base-p representation of n!. The efficient algorithm for computing E(n) is trivial. Now we define
(n!)p=i=1,p∤i∏ni,
that is, the product of integers in [n] which are not multiples of p. Now, we have
n!=pE(n)k=0∏(⌊n/pk⌋!)p,
and our task is to compute (n!)pmodpe.
Suppose n=up+v where 0≤v<m. We may have two terms for (n!)p:
(n!)p=(i=0∏u−1j=1∏p−1(ip+j))(i=1∏v(up+j)).
Note that both terms have a structure of ∏j=1r(ip+j), which can be seen as a polynomial of p. What’s more interesting is that we only care about the polynomial modulo pe. More precisely, notice that
At this point, we have three terms for (n!)p: \stirlingIp1u,fp,e(u) and ∑k=1e\stirlingIv+1k(up)k−1. The first term and the last term are straightforward to compute, so we’ll focus on the middle term (fp,e(u)).
Here comes a useful observation: fp,e(u) is polynomial in u.
Prove fp,e(u) is polynomial in u
We here prove a more general statement:
Lemma: Suppose for any 1≤k<e, wk(u) is a polynomial in u, and
g(x,u)=i=0∏u−1(1+k=1∑e−1wk(u)xk)modxe,
then g is a polynomial in u with a bounded degree
degug(x,u)≤λ(e−1),λ:=1≤k<emaxkdegwk+1.
Proof: For 0≤k<e, let hk(u) be the coefficient of xk in g(x,u). We will prove that hk(u) is polynomial in u with degree ≤λk. With the property of hk(u) ahead, we may simple conclude
Suppose the statement holds for 0<k<e, we have
hk(u)=hk−1(u−1)+t=0∑k−1ht(u−1)wk−t(u−1)=hk−1(0)+t=0∑k−1u′≥0∑u−1ht(u′)wk−t(u′).
As ht(u′)wk−t(u′) is polynomial in u′ with degree ≤deght+degwk−t, hk is also polynomial in u with degree
deghk≤0≤t<kmax(deght+degwk−t+1),
where the additional constant of 1 comes from summation over u′. The definition of λ and induction give
deghk≤0≤t<kmax(λt+λ(k−t))=λk.
Indeed the recursion on deghk is just a Knapsack problem, so a trivial upper bound is just the item with largest density.
QED.
Bound the degree of fp,e(u)
We still have one question to answer: given that fp,e(u) is a polynomial, how do we bound its degree? We’d like to use the lemma above, but the problem, how do we define wk(u)?
Hence, we use wk(i)=\stirlingIp1\stirlingIpk/pik−1 for k>1 and w1(i)=0. The λ here can be made smaller:
λ=1≤k<emaxkdegwk+1=1,⟹degufp,e(u)≤e−1.
(It’s not rigorous here as we can’t divide a number by p under modpe, although we may do it in R. The division here is only used for proof and will not be use in code.)
The other case is e≥p, and we have one more Stirling number to discuss: \stirlingIpp=1. We may re-organize the equations:
The proof above makes use of a special property of Stirling numbers of the first kind, which we’ll give a short proof here:
For prime p and 1<k<p, p divides \stirlingIpk.
Proof: Consider a polynomial f(x)=i=1∏p−1(x+i)∈Fp[x]. The Stirling numbers of the first kind satisfy
f(x)=i=1∏p−1(x+i)=i=1∑p\stirlingIpixi−1.
Note that f(0)=(p−1)!≡−1(modp), as well as f(x)=0 for any x=0. On the other hand, the polynomial g(x)=xp−1−1 has identical values as f in these p points, and both f and g have degree of p−1, thus f=g. Now we compare the coefficients of f and g:
\stirlingIpi≡⎩⎨⎧1,0,−1,i=p,1<i<p,i=1.(modp)
Algorithm for computing (n!)pmodpe
For 0≤v≤p,0≤k≤e, preprocess \stirlingIvk with overall complexity O(pe);
Preprocess fp,e. As a polynomial with degree of d∗ requires d∗+1 points to determine, we may simply brute-force fp,e(u) for 0≤u≤d∗ and use Lagrangian interpolation. The complexity is O(e2).
For each query, we simply use the following equation:
(n!)p≡\stirlingIp1ufp,e(u)(k=1∑e\stirlingIv+1k(up)k−1)(modpe),
with time complexity of O(e) per query.
If d∗≥p, we may be careful in Lagrange interpolation. The following equation can be helpful,
which means that it must be an integer. However, ∏j=ii−ju−j might be a little tricky as i−j might not be invertible under modpe… Anyway we have to represent all numbers in the form of r≡pan(modpe) and do multiplication/division later…
For reference code, prabowo provides a C++ version while I can share my own Python version. Unfortunately, min_25’s Python version was lost.
Hexo
When I wrote the blog post for the first time, I use commands like \newcommand{\aaa}[1]{#1} while Hexo recognize # as SwigTags, which leads to a rendering issue. The solution is quite simple: just add
disableNunjucks: true
to the front-matter. Please refer to this issue (there is a typo in disableNunjucks in the comment though) and this issue.