[WIP] Handle missing values #153
[WIP] Handle missing values #153palday wants to merge 8 commits intoJuliaStats:masterfrom palday:missing
Conversation
Codecov Report
@@ Coverage Diff @@
## master #153 +/- ##
=========================================
- Coverage 83.6% 83.5% -0.11%
=========================================
Files 9 9
Lines 494 497 +3
=========================================
+ Hits 413 415 +2
- Misses 81 82 +1
Continue to review full report at Codecov.
|
|
This generally looks good but needs tests. I think the problem I ran into was that |
|
Yeah, that's about as far as I had gotten as well when I ran out of steam late local time. |
| # layer of indirection | ||
| function copy end | ||
| copy(x::Any) = Base.copy(x) | ||
| copy(m::Missing) = m |
There was a problem hiding this comment.
I think this feels like a bad idea. Maybe better to just drop the use of copy (and broadcast identity or use copy on the vector).
|
Ignore the horrible hacking of |
|
|
||
| concrete_term(t::Term, xs::AbstractVector{<:Number}, ::Nothing) = concrete_term(t, xs, ContinuousTerm) | ||
| # and for missing values | ||
| concrete_term(t::Term, xs::AbstractVector{Union{Missing,T}} where T<:Number, ::Nothing) = concrete_term(t, xs, ContinuousTerm) |
There was a problem hiding this comment.
Couldn't those lines be
concrete_term(t::Term, xs::AbstractVector{<:Union{<:Number,<:Union{Missing,<:Number}}}) =
concrete_term(t, xs, ContinuousTerm)
I don't think we need/should specialize.
There was a problem hiding this comment.
Wouldn't just having AbstractVector{<:Union{Missing, <:Number}} work?
julia> [1,2] isa AbstractVector{<:Union{Missing,<:Number}}
true
julia> [1,2, missing] isa AbstractVector{<:Union{Missing,<:Number}}
true
|
It would be nice if the following julia> using DataFrames, StatsModels
julia> df = DataFrame(t=1:5, x=(1:5).*2.0, y=1:5.0, z=[12; missing; 16.:2:20])
julia> f = @formula(y ~ x + lag(x) + z + lag(z))gave this outcome julia> df_hand0 = transform(df, :z => lag => :z_lagged, :x => lag => :x_lagged)
5×6 DataFrame
Row │ t x y z z_lagged x_lagged
│ Int64 Float64 Float64 Float64? Float64? Float64?
─────┼──────────────────────────────────────────────────────────
1 │ 1 2.0 1.0 12.0 missing missing
2 │ 2 4.0 2.0 missing 12.0 2.0
3 │ 3 6.0 3.0 16.0 missing 4.0
4 │ 4 8.0 4.0 18.0 16.0 6.0
5 │ 5 10.0 5.0 20.0 18.0 8.0
julia> df_hand = df_hand0[completecases(df_hand0),:] |> disallowmissing!
julia> f_hand = @formula(y ~ x + x_lagged + z + z_lagged)
julia> ff_hand = apply_schema(f_hand, schema(f_hand, df_hand))
julia> modelmatrix(ff_hand, df_hand)
2×4 Matrix{Float64}:
8.0 6.0 18.0 16.0
10.0 8.0 20.0 18.0(If you happen to make this work in this PR, feel free to turn this code into a test.) |
When done, this will close #145 and potentially address #17 .